Define the Problem

First define the number of output variables (univariate, bivariate, multivariate), the machine learning paradigm of the problem, then the type of problem, then the algorithm(s) to be used:

Supervised

Unsupervised

Reinforcement

Data Exploration

Numerical Exploration

Take measures of central tendency

Check for missing values

Data Visualization

Histograms

Scatter Matrices

Data Preprocessing