An option on STACKING …

A binary classification problem. This is the stacking perspective used:

Screen Shot 2017-10-08 at 16.09.23

Techniques to use, for example, KNN, Random Forest and SVM.

Cross Validation: 5-fold

Level 0:

Input: Train and test set

For every technique j:

For every fold i:

  • Apply the technique to train set minus fold i and get the model.
  • Train set: Use the model to get a prediction on fold i
  • Test set: Use the model to get prediction on the test set.

Train-stacked set with the j-technique prediction.

Test-stacked set with the i folds prediction average for the j-technique prediction.

Level 1:

Input: Train-stacked set and Test-stacked set

  • Train-stacked set: j columns corresponding to each technique cross-validation predictions, with the j+1 target column.
  • Test-stacked set: j columns corresponding to each model prediction on the test set.

Then use, for example Logistic Regression to train Train-stacked set, and get the final predictions on the Test-stacked set.

Leave a comment