Skip to main content

Data Science Studies // Focus on Statistics First! [2019 Study Plan]

One of the biggest eye-openers was the realization that 'Data Science' really means 'Statistics' and there is a time honored college major and discipline that focuses on statistics.  In many ways, data science is not new, but rather it appears to be so as it leverages modern computing technology with statistical analysis software like R or SAS.

To that end, studying programming languages and writing code is one thing, but to understand why is probably equally important.  I'm spending some time to study statistics textbooks and found some decent, free material online.

I've retooled my Study Plan for 2019 accordingly (original 2018 plan was more Language/Platform specific, but weak on Theory/Technique).  My Study Plan // Eventual Skillset:

I. Domain:
    • Statistics
    • DW/BI (Data Warehouse / Business Intelligence) 
    • Math
II. Tools:
    • SQL
    • Tableau
    • R
    • Python

I. Domain

A. Statistics:
  1. Statistics and Data Analysis (WMU - Statistics 160 Textbook) // Good Intro
  2. Basics of Statistics [BoS] // Great follow-on to WMU to reinforce concepts and better explain some concepts with different definitions/explanations.
  3. Simple Data Analysis for Biologists // Good for Hypothesis building examples
They are all fairly straight-forward reads and its nice to study a variety of books as it shows how different disciplines approach statistics, primarily some core concepts are universal like discrete or continuous data.

I hope to master the basic concepts of statistics and continue studying more to get a better grasp and appreciation for data science.  I'll probably need to brush up on some math textbooks as well.

B. DW/BI:
  1. Kimball - DataWarehouse 3rd Edition
  2. Guide to Data Modeling - UW 1999
C. Math:
I'm proficient in SQL and completed the MCSA - SQL Server that includes the excellent Itzik Ben-Gan 70-461 - Querying SQL Server; probably the best of the 3 textbooks in that series.  This is where my 2018 Study Plan converges with 2019 and is mostly Certs based:
  • Tableau - Desktop Specialist -- $150 USD
  • 70-773 - Analyzing Big Data with Microsoft R [ for MCSE - Data Mgmt & Analyics] -- $165 USD
  • 98-381 - Introduction to Python [for MTA] -- $127 USD
III. Goal
My goal is three (3) part:
  1. Read all the Books (roughly 2,000 pages)
  2. Practical Hands-On Experience
  3. Get the Certs
It'll likely take several years to achieve everything on this 2019 List, so I'm budgeting accordingly and viewing this as a multi-year journey.  If history has any indication it took me 2 years to get the MCSA as I started that path in 2014 and didn't really complete until 2016.  This Data Science goal is far more comprehensive and difficult so 2 years is being optimistic; although I've already spent most of 2018 studying and did complete 2 textbooks (Art of R, Python).  I have much more to learn and study.

IV. Updates

Update 11/14/18 // Finished WMU Stats Book
Finally completed the 11 chapters of the Statistics and Data Analysis (WMU - Statistics 160 Textbook).  Was not too bad and helped me understand some basic concepts, with enough new material that'll require more time to memorize and learn.  I recommend this resource for others who want to learn about Statistics.  I plan to study Basics of Statistics next to ensure I've got the core Statistics foundation.


Update 11/18/18 // Finished Basics of Statistics [BoS] !
Finished Basics of Statistics and this was quite useful as a counterpoint to the WMU textbook.  It was almost like an abbreviated version of WMU.  I skimmed both fairly quickly and will need to return to some sections to truly understand, other concepts are starting to make sense.  Primarily I understand better Central Limit Theory (CLT) and how it applies to Confidence Intervals, Null Hypothesis and p-value.

I already skimmed the Simple Data Analysis for Biologists and that book was quite useful as it helped me understand how to craft Hypothesis.  While BoS and WMU are more focused on the textbook statistics methods, the biologist Data Analysis goes deeper on practical Hypothesis building.  Quite cool how reading multiple books on the same subject can yield strengths in different aspects of that subject!

Update 11/20/18 // Kimball Data Warehouse 3rd Edition
Studying now DW/BI by utilizing Kimball's 3rd Edition book.  It's very good and wish I had read it years ago!  It'll take some time, but having studied Stats helps me better appreciate how DW/BI should be designed to support Analytical Models.

I completed the  Guide to Data Modeling - UW 1999 in a few hours and was good foundation before diving into Kimball as it discussed ERD (Entity-Relationship Diagram) and basic vocabulary that helps give greater context when studying Kimball.

Comments

Popular posts from this blog

Sony MDR-ZX100 vs ZX-110 vs ZX310 Series Headphones

By happenstance I own several Sony ZX Series Headphones: ZX-100 ZX-110 ZX-310 Build Quality (ZX-310 Wins) They use mostly a Plastic Build that is quite durable; the ZX-100 I've had for over 2 1/2 years and been thrown about in my Backpack & Car.  The ONLY issue with the lower model ZX-100 & ZX-110 Series are the crappy Earpads that disintegrate and create a flaky vinyl mess.  Those flakes then get stuck to your ears & hair.  The ZX-310 wins in that regard with much better quality Earpads. Cost (ZX-100/110 Wins) ZX-310's cost about twice that of the ZX-100/110 and I've been able to find several 310's for about $18 at Discount Retailers.  ZX-110's should go for about $10. Specs They are all 30mm Drivers.  The ZX-310 has a wider Frequency Range of 10 ~ 24,000 Hz and a Gold 1/8" Plug.  The ZX-100/110 are roughly identical going 12 ~ 22,000 Hz. Sound (ZX-100 Wins) Surprisingly, I prefer the ZX-100's Sound Signature vs. the 110 and 31

Casio G-Shock 5600 vs 6900 vs 9000

Having now owned several G-Shocks I've come to appreciate certain features of various models.  For most part all I care about primarily is 'size'.  Features are secondary and so long as it has World Time, I'm happy. Size (5600 wins) In terms of size the 5600's are my favorite, but the convenience of the illumination button front and center on the 6900 is beautiful. The compromise?  It has to go to the 9000 series Mudman.  While the buttons are difficult to press it is the right size and button arrangement that just connects with me as one of my favorite G-Shocks thus far.  Now Module versions this is probably a bad selection to compare as I have a GLX, G and DW types.  But form factor wise they are roughly equivalent representatives of their model series.  There are certainly much larger G-Shocks, but for me the 6900 is where I will draw the line for now. Features (G9000 Wins, GLX5600 Good Second) Most G-Shocks have the standard basics down pat and all I car

Eton Microlink FR160 Radio -- Sticky Residue

I bought an Eton Microlink FR160 Handcrank Radio for my Emergency Kit a few years ago and it's great, except for one thing....over time a sticky residue coats the radio's external surface.  It was driving me nuts and I thought something was wrong. Fortunately, rather than dump it I researched online and learned Eton Radios are coated with a substance to make them easier to grip, but over time it degrades and becomes a sticky mess.  Some isopropyl alcohol and cotton balls can clean most of the gunk.  Some paint maybe lost in the process, but the radio at least is not sticky anymore.