Skip to main content

Data Science Studies // Focus on Statistics First! [2019 Study Plan]

Study Plan for Data Science

One of the biggest eye-openers in my journey has been the realization that Data Science fundamentally revolves around Statistics. There’s a well-established college major focused on statistics, and data science isn't entirely new; it’s a combination of traditional statistical methods with modern computing technologies, utilizing software like R and SAS for analysis.

To truly grasp data science, it's essential not only to learn programming languages but also to understand the underlying principles. As such, I've decided to revise my study plan for 2019, shifting from a language/platform-specific focus to a more theory-based approach that emphasizes core concepts and techniques.

Eventual Skillset

I. Domain:

  1. Statistics
  2. DW/BI (Data Warehouse / Business Intelligence)
  3. Math

II. Tools:

  1. SQL
  2. Tableau
  3. R
  4. Python


I. Domain

A. Statistics

  • Statistics and Data Analysis (WMU - Statistics 160 Textbook)
    A solid introduction to key concepts.

  • Basics of Statistics [BoS]
    A great follow-on to reinforce and clarify concepts with alternative definitions.

  • Simple Data Analysis for Biologists
    Useful for examples in hypothesis building.

These readings have been straightforward, providing a diverse perspective on statistics and showcasing the universality of concepts like discrete and continuous data. My goal is to master these fundamental concepts and deepen my appreciation for data science, alongside refreshing my math knowledge.

B. DW/BI

  • Kimball - Data Warehouse 3rd Edition
    Essential reading for understanding data warehousing.

  • Guide to Data Modeling - UW 1999
    A foundational text to support my study of Kimball.

C. Math

  • Advanced Calculus Textbook
  • Probability and Mathematical Statistics

II. Tools

I am proficient in SQL and have completed the MCSA - SQL Server certification, which includes the excellent Itzik Ben-Gan 70-461 - Querying SQL Server, regarded as the best in that series. This year, my study plan converges with certification goals:

  • Tableau - Desktop Specialist: $150 USD
  • 70-773 - Analyzing Big Data with Microsoft R [for MCSE - Data Management & Analytics]: $165 USD
  • 98-381 - Introduction to Python [for MTA]: $127 USD

III. Goals

My goals for this journey are threefold:

  1. Read All the Books: Roughly 2,000 pages to cover.
  2. Practical Hands-On Experience: Apply the concepts learned.
  3. Get the Certifications: Achieve formal recognition of my skills.

Given the extensive scope of this plan, I anticipate that it will take several years to achieve my goals. Historically, it took me about two years to earn my MCSA, and considering the complexity of data science, two years feels optimistic for this endeavor, especially since I have already dedicated much of 2018 to studying and completed two textbooks (Art of R, Python).


IV. Updates

Update 11/14/18

Finished WMU Stats Book
I completed all 11 chapters of the Statistics and Data Analysis (WMU - Statistics 160 Textbook). It was a manageable read that helped clarify several basic concepts while introducing new material that I will need to revisit for deeper understanding. I highly recommend this resource for anyone interested in learning statistics. Next, I plan to study Basics of Statistics to solidify my foundational knowledge.


Update 11/18/18

Finished Basics of Statistics [BoS]
I completed this book, which served as an excellent counterpoint to the WMU textbook. It’s almost like a condensed version of WMU, and while I skimmed both texts, I plan to return to specific sections for more in-depth understanding. Concepts like Central Limit Theory (CLT), confidence intervals, null hypotheses, and p-values are becoming clearer.

I also skimmed the Simple Data Analysis for Biologists, which helped me understand how to formulate hypotheses. Reading multiple texts on the same subject has been beneficial, revealing different strengths and perspectives.

Update 11/20/18

Kimball Data Warehouse 3rd Edition
I’ve begun studying the DW/BI concepts using Kimball’s 3rd Edition. It’s been an excellent read, and I wish I had explored it sooner! My prior studies in statistics are helping me appreciate how to design data warehouses to support analytical models.

I completed the Guide to Data Modeling - UW 1999 quickly, and it provided a solid foundation before diving into Kimball, discussing ERDs (Entity-Relationship Diagrams) and essential vocabulary that adds context to my studies.


This retooled plan aligns my learning path with my long-term goals in data science, combining theoretical knowledge with practical skills and certifications.

Comments

Popular posts from this blog

Sony MDR-ZX100 vs ZX-110 vs ZX310 Series Headphones

Sony ZX Series Headphones Review: A Budget-Friendly Sound Choice If you’re on the hunt for budget-friendly headphones with decent quality, the Sony ZX Series is definitely worth considering. I happen to own several models from the lineup: ZX-100 ZX-110 ZX-310 Let’s dive into how they compare in terms of build quality, cost, specs, sound, and overall value. Build Quality: ZX-310 Takes the Lead The Sony ZX series headphones primarily feature a durable plastic construction. My ZX-100 has lasted over 2½ years, enduring countless tosses into my backpack and car without any issues. However, the lower-end ZX-100 and ZX-110 models have a significant downside: poor-quality earpads. Over time, these earpads disintegrate, leaving vinyl flakes that stick to your hair and ears. The ZX-310, on the other hand, comes with upgraded earpads that don’t suffer from this problem, making them a clear winner in the build department. Cost Comparison: ZX-100/110 Wins for Affordability While the ZX-310 model co...

Casio G-Shock 5600 vs 6900 vs 9000

G-Shock Preferences and Favorites After trying out several G-Shock models, I've developed a better sense of the specific features and design elements I appreciate most. While features are always a plus, my main priority is size . Here's how some of the models I've tried stack up. Size Preference: DW-5600 Series For overall size, the DW-5600 series stands out as a favorite due to its compact, comfortable form. It’s slim, lightweight, and fits well on my wrist without being too bulky. Although the 6900 series provides the benefit of a well-placed front illumination button, the 5600 remains the ideal size for everyday wear. Best Compromise: G9000 Mudman Series If I had to choose a balanced option between size, comfort, and functionality, the G9000 Mudman series would be it. The buttons are slightly tough to press, but the layout and form factor resonate with what I prefer in a G-Shock. Despite having different module versions (GLX, G, and DW), I find that these models offe...

Casio MTD 1010 the $30 Submariner Homage

Casio MTD-1010: The Best Budget Submariner Homage If you’re on the hunt for an affordable watch that channels the classic diver aesthetic of the Submariner, look no further than the Casio MTD-1010 . Priced at around $30 on eBay , this model offers incredible value for anyone who loves a good deal. Affordable Elegance The MTD-1010 strikes a balance between style and practicality. While it features a quartz movement , it captures the essence of the iconic Submariner without being a direct replica or knockoff. This watch embodies the classic diver look with its bold dial, rotating bezel, and luminous hands, making it a fantastic choice for both everyday wear and special occasions. Function Over Frills What sets the MTD-1010 apart is its straightforward functionality. Casio focuses on delivering a reliable timepiece that doesn't get bogged down by unnecessary high-end features. This approach ensures that the watch is both accessible and functional, making it ideal for those who appreci...