Study Plan for Data Science
One of the biggest eye-openers in my journey has been the realization that Data Science fundamentally revolves around Statistics. There’s a well-established college major focused on statistics, and data science isn't entirely new; it’s a combination of traditional statistical methods with modern computing technologies, utilizing software like R and SAS for analysis.
To truly grasp data science, it's essential not only to learn programming languages but also to understand the underlying principles. As such, I've decided to revise my study plan for 2019, shifting from a language/platform-specific focus to a more theory-based approach that emphasizes core concepts and techniques.
Eventual Skillset
I. Domain:
- Statistics
- DW/BI (Data Warehouse / Business Intelligence)
- Math
II. Tools:
- SQL
- Tableau
- R
- Python
I. Domain
A. Statistics
Statistics and Data Analysis (WMU - Statistics 160 Textbook)
A solid introduction to key concepts.Basics of Statistics [BoS]
A great follow-on to reinforce and clarify concepts with alternative definitions.Simple Data Analysis for Biologists
Useful for examples in hypothesis building.
These readings have been straightforward, providing a diverse perspective on statistics and showcasing the universality of concepts like discrete and continuous data. My goal is to master these fundamental concepts and deepen my appreciation for data science, alongside refreshing my math knowledge.
B. DW/BI
Kimball - Data Warehouse 3rd Edition
Essential reading for understanding data warehousing.Guide to Data Modeling - UW 1999
A foundational text to support my study of Kimball.
C. Math
- Advanced Calculus Textbook
- Probability and Mathematical Statistics
II. Tools
I am proficient in SQL and have completed the MCSA - SQL Server certification, which includes the excellent Itzik Ben-Gan 70-461 - Querying SQL Server, regarded as the best in that series. This year, my study plan converges with certification goals:
- Tableau - Desktop Specialist: $150 USD
- 70-773 - Analyzing Big Data with Microsoft R [for MCSE - Data Management & Analytics]: $165 USD
- 98-381 - Introduction to Python [for MTA]: $127 USD
III. Goals
My goals for this journey are threefold:
- Read All the Books: Roughly 2,000 pages to cover.
- Practical Hands-On Experience: Apply the concepts learned.
- Get the Certifications: Achieve formal recognition of my skills.
Given the extensive scope of this plan, I anticipate that it will take several years to achieve my goals. Historically, it took me about two years to earn my MCSA, and considering the complexity of data science, two years feels optimistic for this endeavor, especially since I have already dedicated much of 2018 to studying and completed two textbooks (Art of R, Python).
IV. Updates
Update 11/14/18
Finished WMU Stats Book
I completed all 11 chapters of the Statistics and Data Analysis (WMU - Statistics 160 Textbook). It was a manageable read that helped clarify several basic concepts while introducing new material that I will need to revisit for deeper understanding. I highly recommend this resource for anyone interested in learning statistics. Next, I plan to study Basics of Statistics to solidify my foundational knowledge.
Update 11/18/18
Finished Basics of Statistics [BoS]
I completed this book, which served as an excellent counterpoint to the WMU textbook. It’s almost like a condensed version of WMU, and while I skimmed both texts, I plan to return to specific sections for more in-depth understanding. Concepts like Central Limit Theory (CLT), confidence intervals, null hypotheses, and p-values are becoming clearer.
I also skimmed the Simple Data Analysis for Biologists, which helped me understand how to formulate hypotheses. Reading multiple texts on the same subject has been beneficial, revealing different strengths and perspectives.
Update 11/20/18
Kimball Data Warehouse 3rd Edition
I’ve begun studying the DW/BI concepts using Kimball’s 3rd Edition. It’s been an excellent read, and I wish I had explored it sooner! My prior studies in statistics are helping me appreciate how to design data warehouses to support analytical models.
I completed the Guide to Data Modeling - UW 1999 quickly, and it provided a solid foundation before diving into Kimball, discussing ERDs (Entity-Relationship Diagrams) and essential vocabulary that adds context to my studies.
This retooled plan aligns my learning path with my long-term goals in data science, combining theoretical knowledge with practical skills and certifications.
Comments
Post a Comment