Here, I will walk through the amazing PyCaret introductory notebook created by Firat Gonen and try and add some details to explain what's going on in the background.
PyCaret Introduction (Classification & Regression)
There are many different types of machine learning problems, two big ones are classification problems and regression problems. Classification problems try to predict a category, whereas regression problems try to predict values.
pip install pycaret
This is the first thing you HAVE to initialize and sets up all of the data transformations that you might use on your models. The only necessary parameters are data
and target
.
clf1 = setup(data = train,
target = 'Survived',
numeric_imputation = 'mean',
categorical_features = ['Sex','Embarked'],
ignore_features = ['Name','Ticket','Cabin'],
silent = True)
#quite intuitive isn't it ?
target
: What value are we trying to predict
numeric_imputation
: If we're missing numerical values, what do we replace them with
categorical_features
: Which features (columns) are categorical
ignore_features
: What features would you like to ignore
silent
: When true, confirmation of data types is not necessary, preprocessing will be performed automatically
reg = setup(data = train,
target = 'SalePrice',
numeric_imputation = 'mean',
categorical_features = ['MSZoning','Exterior1st','Exterior2nd','KitchenQual','Functional','SaleType',
'Street','LotShape','LandContour','LotConfig','LandSlope','Neighborhood',
'Condition1','Condition2','BldgType','HouseStyle','RoofStyle','RoofMatl',
'MasVnrType','ExterQual','ExterCond','Foundation','BsmtQual','BsmtCond',
'BsmtExposure','BsmtFinType1','BsmtFinType2','Heating','HeatingQC','CentralAir',
'Electrical','GarageType','GarageFinish','GarageQual','GarageCond','PavedDrive',
'SaleCondition'] ,
ignore_features = ['Alley','PoolQC','MiscFeature','Fence','FireplaceQu','Utilities'],
normalize = True,
silent = True)
target
: What value are we trying to predict