Intro
- Problem Framing
- Data Understanding
- Data Cleaning
- Data Selection
- Data Preparation
- Model Evaluation
- Model Configuration
- Model Selection
- Model Presentation
- Model Predictions
https://machinelearningmastery.com/statistical-methods-in-an-applied-machine-learning-project

Types of Data Science
Descriptive Analytics (Business Intelligence)
Get useful data in front of the right people in the form of dashboards, reports and emails
- Which customers have churned?
- Which homes have sold in a given location, and do homes of a certain size sell more quickly?
Predictive Analytics (Machine Learning)
Put data science models continuously into production
- Which customers may churn?
- How much will a home sell for, given its location and number of rooms?
Prescriptive Analytics (Decision Science)
Use data to help a company make decisions
- What should we do about the particular types of customers that are prone to churn?
- How should we market a home to sell quickly, given its location and number of rooms?
Prescriptive analytics uses data, algorithms, and machine learning to recommend the best course of action to achieve a specific goal, going beyond predicting future outcomes to suggest "what should we do?". It optimizes processes, resource allocation, and decision-making by simulating various strategies and presenting the optimal one based on predicted outcomes and business rules.
Difference
Descriptive analytics explains what happened in the past, predictive analytics forecasts what is likely to happen in the future, and prescriptive analytics recommends the best course of action to take. Descriptive analytics uses historical data to find patterns and answer "what happened?". Predictive analytics uses statistical models and historical data to forecast future outcomes and answer "what might happen?". Prescriptive analytics builds on these to suggest specific actions and answer "what should we do?".
Fields
Data Analysis
Taking raw information and turning it into knowledge that can be acted on or that can drive a decision
- Domain knowledge - translate a business need to a question, make accuracy-cost trade-offs
- Research - gather the data, design and conduct experiments
- Interpretation - Summarize and aggregate, visualize, apply statistical tools
Data Modeling
Using the data that we have to estimate the data that we wish we had
- Supervised learning - classification, regression, anomaly detection
- Unsupervised learning - clustering, dimensionality reduction, anomaly detection
- Custom algorithm development - feature engineering, numerical optimization
Data Engineering
Taking these analysis and modeling activities and making everything work faster, more robustly, and on larger quantities of data
-
Data management - database management, pipeline construction, data collection
-
Production - automation, system integration, robustification
-
Software engineering - ensure maintainability, scaling, collaborative development
-
https://medium.com/@darshilp/roadmap-for-data-engineering-2023-13f62f85d866
-
https://medium.com/@darshilp/roadmap-for-data-engineering-2024-af7ea4ead400
-
https://github.com/gunnarmorling/awesome-opensource-data-engineering