Code Fellows courses Notes
This project is maintained by QamarAlkhatib
data science primer will cover exploratory analysis, data cleaning, feature engineering, algorithm selection, and model training
As you can see, those chunks make up 80% of the pie. They also set the foundation for more advanced techniques.
Machine Learning ≠ Algorithms:
Machine learning is not about algorithms. This has fueled the misconception that machine learning is about mastering dozens of algorithms. Machine learning is a comprehensive approach to solving problems and individual algorithms are only one piece of the puzzle.
What makes machine learning so special?
Machine learning is the practice of teaching computers how to learn patterns from data, often for making decisions or predictions.
For true machine learning, the computer must be able to learn patterns that it’s not explicitly programmed to identify.
This includes duplicate or irrelevant observations.
Fix Structural Errors Structural errors are those that arise during measurement, data transfer, or other types of “poor housekeeping.”
Filter Unwanted Outliers Outliers can cause problems with certain types of models. For example, linear regression models are less robust to outliers than decision tree models.
Handle Missing Data the 2 most commonly recommended ways of dealing with missing data actually suck.
They are:
Dropping observations that have missing values
Imputing the missing values based on other observations
Missing categorical data
The best way to handle missing data for categorical features is to simply label them as ’Missing’!
Missing numeric data
For missing numeric data, you should flag and fill the values. or fill them with.
Feature engineering is about creating new input features from your existing ones.
You can isolate and highlight key information, which helps your algorithms «focus» on what’s important.