Define the Problem
First define the number of output variables (univariate, bivariate, multivariate), the machine learning paradigm of the problem, then the type of problem, then the algorithm(s) to be used:
Supervised
- Regression - Predicting a value
- Classification - Predict a class
Unsupervised
- Clustering - Group ungrouped data
- Association Rule Mining - Find associations between two or more classes
Reinforcement
Data Exploration
Numerical Exploration
Take measures of central tendency
Check for missing values
Data Visualization
Histograms
Scatter Matrices
- These plot every numerical variable against every other numerical variable to help you visually seek out correlations
Data Preprocessing