One of the biggest eye-openers was the realization that 'Data Science' really means 'Statistics' and there is a time honored college major and discipline that focuses on statistics. In many ways, data science is not new, but rather it appears to be so as it leverages modern computing technology with statistical analysis software like R or SAS.
To that end, studying programming languages and writing code is one thing, but to understand why is probably equally important. I'm spending some time to study statistics textbooks and found some decent, free material online.
I've retooled my Study Plan for 2019 accordingly (original 2018 plan was more Language/Platform specific, but weak on Theory/Technique). My Study Plan // Eventual Skillset:
I. Domain:
I. Domain
To that end, studying programming languages and writing code is one thing, but to understand why is probably equally important. I'm spending some time to study statistics textbooks and found some decent, free material online.
I've retooled my Study Plan for 2019 accordingly (original 2018 plan was more Language/Platform specific, but weak on Theory/Technique). My Study Plan // Eventual Skillset:
I. Domain:
- Statistics
- DW/BI (Data Warehouse / Business Intelligence)
- Math
- SQL
- Tableau
- R
- Python
A. Statistics:
- Statistics and Data Analysis (WMU - Statistics 160 Textbook) // Good Intro
- Basics of Statistics [BoS] // Great follow-on to WMU to reinforce concepts and better explain some concepts with different definitions/explanations.
- Simple Data Analysis for Biologists // Good for Hypothesis building examples
They are all fairly straight-forward reads and its nice to study a variety of books as it shows how different disciplines approach statistics, primarily some core concepts are universal like discrete or continuous data.
I hope to master the basic concepts of statistics and continue studying more to get a better grasp and appreciation for data science. I'll probably need to brush up on some math textbooks as well.
B. DW/BI:
Update 11/18/18 // Finished Basics of Statistics [BoS] !
Finished Basics of Statistics and this was quite useful as a counterpoint to the WMU textbook. It was almost like an abbreviated version of WMU. I skimmed both fairly quickly and will need to return to some sections to truly understand, other concepts are starting to make sense. Primarily I understand better Central Limit Theory (CLT) and how it applies to Confidence Intervals, Null Hypothesis and p-value.
I already skimmed the Simple Data Analysis for Biologists and that book was quite useful as it helped me understand how to craft Hypothesis. While BoS and WMU are more focused on the textbook statistics methods, the biologist Data Analysis goes deeper on practical Hypothesis building. Quite cool how reading multiple books on the same subject can yield strengths in different aspects of that subject!
Update 11/20/18 // Kimball Data Warehouse 3rd Edition
Studying now DW/BI by utilizing Kimball's 3rd Edition book. It's very good and wish I had read it years ago! It'll take some time, but having studied Stats helps me better appreciate how DW/BI should be designed to support Analytical Models.
I completed the Guide to Data Modeling - UW 1999 in a few hours and was good foundation before diving into Kimball as it discussed ERD (Entity-Relationship Diagram) and basic vocabulary that helps give greater context when studying Kimball.
B. DW/BI:
C. Math:
I'm proficient in SQL and completed the MCSA - SQL Server that includes the excellent Itzik Ben-Gan 70-461 - Querying SQL Server; probably the best of the 3 textbooks in that series. This is where my 2018 Study Plan converges with 2019 and is mostly Certs based:
- Tableau - Desktop Specialist -- $150 USD
- 70-773 - Analyzing Big Data with Microsoft R [ for MCSE - Data Mgmt & Analyics] -- $165 USD
- 98-381 - Introduction to Python [for MTA] -- $127 USD
III. Goal
My goal is three (3) part:
- Read all the Books (roughly 2,000 pages)
- Practical Hands-On Experience
- Get the Certs
It'll likely take several years to achieve everything on this 2019 List, so I'm budgeting accordingly and viewing this as a multi-year journey. If history has any indication it took me 2 years to get the MCSA as I started that path in 2014 and didn't really complete until 2016. This Data Science goal is far more comprehensive and difficult so 2 years is being optimistic; although I've already spent most of 2018 studying and did complete 2 textbooks (Art of R, Python). I have much more to learn and study.
IV. Updates
Update 11/14/18 // Finished WMU Stats Book
Finally completed the 11 chapters of the Statistics and Data Analysis (WMU - Statistics 160 Textbook). Was not too bad and helped me understand some basic concepts, with enough new material that'll require more time to memorize and learn. I recommend this resource for others who want to learn about Statistics. I plan to study Basics of Statistics next to ensure I've got the core Statistics foundation.
Update 11/18/18 // Finished Basics of Statistics [BoS] !
Finished Basics of Statistics and this was quite useful as a counterpoint to the WMU textbook. It was almost like an abbreviated version of WMU. I skimmed both fairly quickly and will need to return to some sections to truly understand, other concepts are starting to make sense. Primarily I understand better Central Limit Theory (CLT) and how it applies to Confidence Intervals, Null Hypothesis and p-value.
I already skimmed the Simple Data Analysis for Biologists and that book was quite useful as it helped me understand how to craft Hypothesis. While BoS and WMU are more focused on the textbook statistics methods, the biologist Data Analysis goes deeper on practical Hypothesis building. Quite cool how reading multiple books on the same subject can yield strengths in different aspects of that subject!
Update 11/20/18 // Kimball Data Warehouse 3rd Edition
Studying now DW/BI by utilizing Kimball's 3rd Edition book. It's very good and wish I had read it years ago! It'll take some time, but having studied Stats helps me better appreciate how DW/BI should be designed to support Analytical Models.
I completed the Guide to Data Modeling - UW 1999 in a few hours and was good foundation before diving into Kimball as it discussed ERD (Entity-Relationship Diagram) and basic vocabulary that helps give greater context when studying Kimball.
Comments
Post a Comment