class: center, middle, inverse, title-slide # Tuning ### Applied Machine Learning with R
The R Bootcamp @ AMLD
### January 2020 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> www.therbootcamp.com </font> </span> </a> <a href="https://therbootcamp.github.io/"> <font color="#7E7E7E"> Applied Machine Learning with R @ AMLD | January 2020 </font> </a> </span> </div> --- .pull-left4[ <br><br><br> # Fighting overfitting <ul> <li class="m1"><span>When a model <high>fits the training data too well</high> on the expense of its performance in prediction, this is called overfitting.</span></li> <li class="m2"><span>Just because model A is better than model B in training, does not mean it will be better in testing! Extremely flexible models are <high>'wolves in sheep's clothing'</high>.</span></li> <li class="m2"><span>But is there nothing we can do?.</span></li> </ul> ] .pull-right55[ <br><br> <p align = "center"> <img src="image/wolf_complex.png"><br> <font style="font-size:10px">adapted from <a href="">victoriarollison.com</a></font> </p> ] --- # Tuning parameters .pull-left45[ <ul> <li class="m1"><span>Machine learning models are equipped with tuning parameters that <high> control model complexity<high>.</span></li><br> <li class="m2"><span>These tuning parameters can be identified using a <high>validation set</high> created from the traning data.</span></li><br> <li class="m3"><span>Algorithm: <br><br> <ul class="level"> <li><span>1 - Create separate test set.</span></li> <li><span>2 - Fit model using various tuning parameters.</span></li> <li><span>3 - Select tuning leading to best prediction on validation set.</span></li> <li><span>4 - Refit model to entire training set (training + validation).</span></li> </ul> </span></li> </ul> ] .pull-right45[ <p align = "center" style="padding-top:0px"> <img src="image/validation.png" height=430px> </p> ] --- # Resampling methods .pull-left4[ <ul> <li class="m1"><span>Resampling methods automatize and generalize model tuning.</span></li> </ul> <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="30%"> <col width="70%"> <tr> <td bgcolor="white"> <b>Method</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>k-fold cross-validation</i> </td> <td bgcolor="white"> Splits the data in k-pieces, use <high>each piece once</high> as the validation set, while using the other one for training. </td> </tr> <tr> <td bgcolor="white"> <i>Bootstrap</i> </td> <td bgcolor="white"> For <i>B</i> bootstrap rounds <high>sample</high> from the data <high>with replacement</high> and split the data in training and validation set. </td> </tr> </table> ] .pull-right5[ <p align = "center" style="padding-top:0px"> <img src="image/resample1.png"> </p> ] --- # Resampling methods .pull-left4[ <ul> <li class="m1"><span>Resampling methods automatize and generalize model tuning.</span></li> </ul> <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="30%"> <col width="70%"> <tr> <td bgcolor="white"> <b>Method</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>k-fold cross-validation</i> </td> <td bgcolor="white"> Splits the data in k-pieces, use <high>each piece once</high> as the validation set, while using the other one for training. </td> </tr> <tr> <td bgcolor="white"> <i>Bootstrap</i> </td> <td bgcolor="white"> For <i>B</i> bootstrap rounds <high>sample</high> from the data <high>with replacement</high> and split the data in training and validation set. </td> </tr> </table> ] .pull-right5[ <p align = "center" style="padding-top:0px"> <img src="image/resample2.png"> </p> ] --- # Resampling methods .pull-left4[ <ul> <li class="m1"><span>Resampling methods automatize and generalize model tuning.</span></li> </ul> <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="30%"> <col width="70%"> <tr> <td bgcolor="white"> <b>Method</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>k-fold cross-validation</i> </td> <td bgcolor="white"> Splits the data in k-pieces, use <high>each piece once</high> as the validation set, while using the other one for training. </td> </tr> <tr> <td bgcolor="white"> <i>Bootstrap</i> </td> <td bgcolor="white"> For <i>B</i> bootstrap rounds <high>sample</high> from the data <high>with replacement</high> and split the data in training and validation set. </td> </tr> </table> ] .pull-right5[ <p align = "center" style="padding-top:0px"> <img src="image/resample3.png"> </p> ] --- class: center, middle <high><h1>Regression</h1></high> <font color = "gray"><h1>Decision Trees</h1></font> <font color = "gray"><h1>Random Forests</h1></font> --- # Regularized regression .pull-left45[ <ul> <li class="m1"><span>Penalizes regression loss for having large <font style="font-size:22px">&beta</font>; values using the <high>lambda λ tuning parameter</high> and one of several penalty functions.</span></li> </ul> $$Regularized \;loss = \sum_i^n (y_i-\hat{y}_i)^2+\lambda \sum_j^p f(\beta_j)) $$ <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td bgcolor="white"> <b>Name</b> </td> <td bgcolor="white"> <b>Function</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>Lasso</i> </td> <td bgcolor="white"> |β<sub>j</sub>| </td> <td bgcolor="white"> Penalize by the <high>absolute</high> regression weights. </td> </tr> <tr> <td bgcolor="white"> <i>Ridge</i> </td> <td bgcolor="white"> β<sub>j</sub><sup>2</sup> </td> <td bgcolor="white"> Penalize by the <high>squared</high> regression weights. </td> </tr> <tr> <td bgcolor="white"> <i>Elastic net</i> </td> <td bgcolor="white"> |β<sub>j</sub>| + β<sub>j</sub><sup>2</sup> </td> <td bgcolor="white"> Penalize by Lasso and Ridge penalties. </td> </tr> </table> ] .pull-right45[ <p align = "center"> <img src="image/bonsai.png"><br> <font style="font-size:10px">from <a href="https://www.mallorcazeitung.es/leben/2018/05/02/bonsai-liebhaber-mallorca-kunst-lebenden/59437.html">mallorcazeitung.es</a></font> </p> ] --- .pull-left45[ # Regularized regression <p style="padding-top:1px"></p> <ul> <li class="m1"><span><b>Ridge</b> <br><br> <ul class="level"> <li><span>By penalizing the most extreme βs most strongly, Ridge leads to (relatively) more <high>uniform βs</high>.</span></li> </ul> </span></li><br><br><br><br> <li class="m2"><span><b>Lasso</b> <br><br> <ul class="level"> <li><span>By penalizing all βs equally, irrespective of magnitude, Lasso drives some βs to 0 resulting effectively in <high>automatic feature selection</high>.</span></li> </ul> </span></li> </ul> ] .pull-right45[ <br> <p align = "center"> <font style="font-size:40"><i>Ridge</i></font><br> <img src="image/ridge.png" height=210px><br> <font style="font-size:10px">from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a></font> </p> <p align = "center"> <font style="font-size:40"><i>Lasso</i></font><br> <img src="image/lasso.png" height=210px><br> <font style="font-size:10px">from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a></font> </p> ] --- # Regularized regression .pull-left4[ <ul> <li class="m1"><span>To fit Lasso or Ridge penalized regression in R, use <mono>method = "glmnet"</mono>.</span></li> <li class="m2"><span>Specify the <high>type of penalty</high> and the <high>penalty weight</high> using the <mono>tuneGrid</mono> argument.</span></li> </ul> <br> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td bgcolor="white"> <b>Parameter</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <mono>alpha = 1</mono> </td> <td bgcolor="white"> Regression with Lasso penalty. </td> </tr> <tr> <td bgcolor="white"> <mono>alpha = 0</mono> </td> <td bgcolor="white"> Regression with Ridge penalty. </td> </tr> <tr> <td bgcolor="white"> <mono>lambda</mono> </td> <td bgcolor="white"> Regularization penalty weight. </td> </tr> </table> ] .pull-right45[ ```r # Train ridge regression train(form = criterion ~ ., data = data_train, method = "glmnet", trControl = ctrl, tuneGrid = expand.grid(alpha = 0, # Ridge lambda = 1)) # Lambda # Train lasso regression train(form = criterion ~ ., data = data_train, method = "glmnet", trControl = ctrl, tuneGrid = expand.grid(alpha = 1, # Lasso lambda = 1)) # Lambda ``` ] --- class: center, middle <font color = "gray"><h1>Regression</h1></font> <high><h1>Decision Trees</h1></high> <font color = "gray"><h1>Random Forests</h1></font> --- # Decision trees .pull-left4[ <ul> <li class="m1"><span>Decision trees have a <high>complexity parameter</high> called <high>cp</high>.</span></li> </ul> <p style="padding-top:3px"></p> $$ \large `\begin{split} Loss = & Impurity\,+\\ &cp*(n\:terminal\:nodes)\\ \end{split}` $$ <p style="padding-top:3px"></p> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td bgcolor="white"> <b>Parameter</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> Small <mono>cp</mono>, e.g., <mono>cp<.01</mono> </td> <td bgcolor="white"> Low penalty leading to <high>complex trees</high>. </td> </tr> <tr> <td bgcolor="white"> Large <mono>cp</mono>, e.g., <mono>cp<.20</mono> </td> <td bgcolor="white"> Large penalty leading to <high>simple trees</high>. </td> </tr> </table> ] .pull-right5[ <p align = "center"> <img src="image/cp.png"> </p> ] --- # Decision trees .pull-left4[ <ul> <li class="m1"><span>Decision trees have a <high>complexity parameter</high> called <high>cp</high>.</span></li> </ul> <p style="padding-top:3px"></p> $$ \large `\begin{split} Loss = & Impurity\,+\\ &cp*(n\:terminal\:nodes)\\ \end{split}` $$ <p style="padding-top:3px"></p> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td bgcolor="white"> <b>Parameter</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> Small <mono>cp</mono>, e.g., <mono>cp<.01</mono> </td> <td bgcolor="white"> Low penalty leading to <high>complex trees</high>. </td> </tr> <tr> <td bgcolor="white"> Large <mono>cp</mono>, e.g., <mono>cp<.20</mono> </td> <td bgcolor="white"> Large penalty leading to <high>simple trees</high>. </td> </tr> </table> ] .pull-right5[ ```r # Decision tree with a defined cp = .01 train(form = income ~ ., data = baselers, method = "rpart", # Decision Tree trControl = ctrl, tuneGrid = expand.grid(cp = .01)) # cp # Decision tree with a defined cp = .2 train(form = income ~ ., data = baselers, method = "rpart", # Decision Tree trControl = ctrl, tuneGrid = expand.grid(cp = .2)) # cp ``` ] --- class: center, middle <font color = "gray"><h1>Regression</h1></font> <font color = "gray"><h1>Decision Trees</h1></font> <high><h1>Random Forests</h1></high> --- # Random Forest .pull-left4[ <ul> <li class="m1"><span>Random Forests have a <high>diversity parameter</high> called <mono>mtry</mono>.</span></li> <li class="m2"><span>Technically, this controls how many features are randomly considered at each split of the trees..</span></li> </ul> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td bgcolor="white"> <b>Parameter</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> Small <mono>mtry</mono>, e.g., <mono>mtry = 1</mono> </td> <td bgcolor="white"> <high>Diverse forest.</high> In a way, less complex. </td> </tr> <tr> <td bgcolor="white"> Large <mono>mtry</mono>, e.g., <mono>mtry>5</mono> </td> <td bgcolor="white"> <high>Similar forest.</high> In a way, more complex. </td> </tr> </table> ] .pull-right5[ <p align = "center"> <img src="image/mtry_parameter.png"> </p> ] --- # Random Forest .pull-left4[ <ul> <li class="m1"><span>Random Forests have a <high>diversity parameter</high> called <mono>mtry</mono>.</span></li> <li class="m2"><span>Technically, this controls how many features are randomly considered at each split of the trees.</span></li> </ul> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td bgcolor="white"> <b>Parameter</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> Small <mono>mtry</mono>, e.g., <mono>mtry = 1</mono> </td> <td bgcolor="white"> <high>Diverse forest.</high> In a way, less complex. </td> </tr> <tr> <td bgcolor="white"> Large <mono>mtry</mono>, e.g., <mono>mtry>5</mono> </td> <td bgcolor="white"> <high>Similar forest.</high> In a way, more complex. </td> </tr> </table> ] .pull-right5[ ```r # Random forest with a defined mtry = 2 train(form = income ~ ., data = baselers, method = "rf", # Random forest trControl = ctrl, tuneGrid = expand.grid(mtry = 2)) # mtry # Random forest with a defined mtry = 5 train(form = income ~ ., data = baselers, method = "rf", # Random forest trControl = ctrl, tuneGrid = expand.grid(mtry = 5)) # mtry ``` ] --- class: center, middle <br><br> # Parameter tuning with k-fold cross-validation with `caret` <img src="https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2014/09/Caret-package-in-R.png" width="60%" style="display: block; margin: auto;" /> --- .pull-left45[ # <i>k</i>-fold cross validation for Ridge and Lasso <p style="padding-top:1px"></p> <ul> <li class="m1"><span><b>Goal</b> <br><br> <ul class="level"> <li><span>Use 10-fold cross-validation to identify <high>optimal regularization parameters</high> for a regression model.</span></li> </ul> </span></li><br> <li class="m2"><span><b>Using</b> <br><br> <ul class="level"> <li><span><font style="font-size:22px"><mono>α ∈ 0, .5., 1</mono></font> and <font style="font-size:22px"><mono>λ ∈ 1, 2., ..., 100</mono></font>.</span></li> </ul> </span></li> </ul> ] .pull-right45[ <br><br><br> <p align = "center"> <img src="image/lasso_process.png" height=460px> </p> ] --- # <mono>trainControl()</mono> .pull-left4[ <ul> <li class="m1"><span>Specify the use of k-fold cross-validation using the <mono>trainControl()</mono> function.</span></li> </ul> <br> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td bgcolor="white"> <b>Argument</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <mono>method</mono> </td> <td bgcolor="white"> The resampling method, use <mono>"cv"</mono> for cross validation. </td> </tr> <tr> <td bgcolor="white"> <mono>number</mono> </td> <td bgcolor="white"> The number of folds. </td> </tr> </table> ] .pull-right5[ ```r # Specify 10 fold cross-validation ctrl_cv <- trainControl(method = "cv", number = 10) # Predict income using glmnet glmnet_mod <- train(form = income ~ ., data = baselers, method = "glmnet", trControl = ctrl_cv) ``` ] --- # <mono>tuneGrid</mono> .pull-left4[ <ul> <li class="m1"><span>Specify the tuning parameter values to consider using the <high><mono>tuneGrid</mono></high>.</span></li> <li class="m2"><span><mono>tuneGrid</mono> expects a <high>list or data frame</high> as input.</span></li> <li class="m3"><span><high>Parameter combinations</high> can be easily created using <mono>expand.grid</mono>.</span></li> </ul> ] .pull-right5[ ```r # Specify 10 fold cross-validation ctrl_cv <- trainControl(method = "cv", number = 10) # Predict income using glmnet glmnet_mod <- train(form = income ~ ., data = baselers, method = "glmnet", trControl = ctrl_cv, tuneGrid = expand.grid( alpha = c(0, .5, 1), lambda = 1:100)) ``` ] --- .pull-left4[ # <i>k</i>-Fold Cross validation <p style="padding-top:1px"></p> ```r # Print summary information glmnet_mod ``` <br> At the end... `RMSE was used to select the optimal model using the smallest value. The final values used for the model were alpha = 1 and lambda = 27.` ] .pull-right5[ <br> ``` glmnet 1000 samples 19 predictor No pre-processing Resampling: Cross-Validated (10 fold) Summary of sample sizes: 900, 901, 900, 901, 901, 899, ... Resampling results across tuning parameters: alpha lambda RMSE Rsquared MAE 0.0 1 1047 0.8614 823.3 0.0 2 1047 0.8614 823.3 0.0 3 1047 0.8614 823.3 0.0 4 1047 0.8614 823.3 0.0 5 1047 0.8614 823.3 0.0 6 1047 0.8614 823.3 0.0 7 1047 0.8614 823.3 0.0 8 1047 0.8614 823.3 0.0 9 1047 0.8614 823.3 0.0 10 1047 0.8614 823.3 0.0 11 1047 0.8614 823.3 0.0 12 1047 0.8614 823.3 ``` ] --- # <i>k</i>-Fold Cross validation .pull-left4[ ```r # Visualise tuning error curve plot(glmnet_mod) ``` <br> At the end... `RMSE was used to select the optimal model using the smallest value. The final values used for the model were alpha = 1 and lambda = 27.` ] .pull-right5[ <img src="Tuning_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ] --- .pull-left35[ # Final model ```r # Model coefficients for best # alpha and lambda coef(glmnet_mod$finalModel, glmnet_mod$bestTune$lambda) ``` ] .pull-right5[ <br> ``` 25 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) 462.1958 id . sexmale . age 116.1387 height 1.8865 weight . educationobligatory_school . educationSEK_II . educationSEK_III 1.1857 confessionconfessionless 3.9774 confessionevangelical-reformed . confessionmuslim . confessionother . children -21.0502 happiness -128.5448 fitness . food 2.3193 alcohol 22.2351 tattoos -24.8320 rhine 0.4256 ``` ] --- # Model comparison .pull-left35[ <ul> <li class="m1"><span>Compare the prediction performance of several models with <mono>resamples()</mono>.</span></li> <li class="m2"><span>The <mono>summary()</mono> of this object will print 'prediction' error statistics from cross-validation during training. This is your <high>estimate of future prediction performance</high>!</span></li> </ul> ] .pull-right55[ ```r # Simple competitor model glm_mod <- train(form = income ~ ., data = baselers, method = "glm", trControl = ctrl_cv) # Determine prediction statistics resamples_mod <- resamples( list(glmnet = glmnet_mod, glm = glm_mod)) # Print result summary summary(resamples_mod) ``` ] --- .pull-left35[ # Model comparison <p style="padding-top:1px"></p> <ul> <li class="m1"><span>Compare the prediction performance of several models with <mono>resamples()</mono>.</span></li> <li class="m2"><span>The <mono>summary()</mono> of this object will print 'prediction' error statistics from cross-validation during training. This is your <high>estimate of future prediction performance</high>!</span></li> </ul> ] .pull-right55[ <br><br> ``` Call: summary.resamples(object = resamples_mod) Models: glmnet, glm Number of resamples: 10 MAE Min. 1st Qu. Median Mean 3rd Qu. Max. NA's glmnet 743.1 761.2 818.3 807.8 836.7 891.7 0 glm 734.8 777.7 801.6 812.8 844.5 892.2 0 RMSE Min. 1st Qu. Median Mean 3rd Qu. Max. NA's glmnet 936.5 990.3 1042 1028 1076 1098 0 glm 950.7 1008.9 1016 1034 1063 1128 0 Rsquared Min. 1st Qu. Median Mean 3rd Qu. Max. NA's glmnet 0.8386 0.8440 0.8582 0.8638 0.8865 0.9021 0 glm 0.8268 0.8549 0.8694 0.8624 0.8740 0.8825 0 ``` ] --- class: middle, center <h1><a href=https://therbootcamp.github.io/AML_2020AMLD/_sessions/Tuning/Tuning_practical.html>Practical</a></h1>