Tuning

# Tuning
### Applied Machine Learning with R <a href='https://therbootcamp.github.io'> The R Bootcamp @ AMLD </a> <a href='https://therbootcamp.github.io/AML_2020AMLD/'> </a>  <a href='https://therbootcamp.github.io'> </a>  <a href='mailto:therbootcamp@gmail.com'> </a>  <a href='https://www.linkedin.com/company/basel-r-bootcamp/'> </a>
### January 2020

---

<div class="my-footer">
 
 
 <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/>
 
 <a href="https://therbootcamp.github.io/">
 
 
 www.therbootcamp.com
 
 
 </a>
 <a href="https://therbootcamp.github.io/">
 
 Applied Machine Learning with R @ AMLD | January 2020
 
 </a>
 
 </div>

---

<ul>
 <li class="m1">When a model <high>fits the training data too well</high> on the expense of its performance in prediction, this is called overfitting.</li>
 <li class="m2">Just because model A is better than model B in training, does not mean it will be better in testing! Extremely flexible models are <high>'wolves in sheep's clothing'</high>.</li>
 <li class="m2">But is there nothing we can do?.</li>
</ul>

]

<img src="image/wolf_complex.png"> 
adapted from <a href="">victoriarollison.com</a>

]

---

# Tuning parameters

<ul>
 <li class="m1">Machine learning models are equipped with tuning parameters that <high> control model complexity<high>.</li> 
 <li class="m2">These tuning parameters can be identified using a <high>validation set</high> created from the traning data.</li> 
 <li class="m3">Algorithm:
 
 <ul class="level">
 <li>1 - Create separate test set.</li>
 <li>2 - Fit model using various tuning parameters.</li>
 <li>3 - Select tuning leading to best prediction on validation set.</li>
 <li>4 - Refit model to entire training set (training + validation).</li>
 </ul>
 </li>
</ul>

]

]

---

# Resampling methods

<ul>
 <li class="m1">Resampling methods automatize and generalize model tuning.</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
 <col width="30%">
 <col width="70%">
<tr>
 <td bgcolor="white">
 Method
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 k-fold cross-validation
 </td>
 <td bgcolor="white">
 Splits the data in k-pieces, use <high>each piece once</high> as the validation set, while using the other one for training. 
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Bootstrap
 </td>
 <td bgcolor="white">
 For B bootstrap rounds <high>sample</high> from the data <high>with replacement</high> and split the data in training and validation set. 
 </td> 
</tr>
</table>
]

]

---

# Resampling methods

<ul>
 <li class="m1">Resampling methods automatize and generalize model tuning.</li>
</ul>

]

---

# Resampling methods

<ul>
 <li class="m1">Resampling methods automatize and generalize model tuning.</li>
</ul>

]

---

<high><h1>Regression</h1></high>

<h1>Decision Trees</h1>

<h1>Random Forests</h1>

---

# Regularized regression

<ul>
 <li class="m1">Penalizes regression loss for having large &beta; values using the <high>lambda &lambda; tuning parameter</high> and one of several penalty functions.</li>
</ul>

$$Regularized \;loss = \sum_i^n (y_i-\hat{y}_i)^2+\lambda \sum_j^p f(\beta_j)) $$
<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Name
 </td>
 <td bgcolor="white">
 Function
 </td> 
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Lasso
 </td>
 <td bgcolor="white">
 |&beta;j|
 </td> 
 <td bgcolor="white">
 Penalize by the <high>absolute</high> regression weights.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Ridge 
 </td>
 <td bgcolor="white">
 &beta;j2
 </td> 
 <td bgcolor="white">
 Penalize by the <high>squared</high> regression weights.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Elastic net
 </td>
 <td bgcolor="white">
 |&beta;j| + &beta;j2
 </td> 
 <td bgcolor="white">
 Penalize by Lasso and Ridge penalties.
 </td> 
</tr>
</table>

]

<img src="image/bonsai.png"> 
from <a href="https://www.mallorcazeitung.es/leben/2018/05/02/bonsai-liebhaber-mallorca-kunst-lebenden/59437.html">mallorcazeitung.es</a>

]

---

# Regularized regression

<ul>
 <li class="m1">Ridge
 
 <ul class="level">
 <li>By penalizing the most extreme &beta;s most strongly, Ridge leads to (relatively) more <high>uniform &beta;s</high>.</li>
 </ul>
 </li> 
 <li class="m2">Lasso
 
 <ul class="level">
 <li>By penalizing all &beta;s equally, irrespective of magnitude, Lasso drives some &beta;s to 0 resulting effectively in <high>automatic feature selection</high>.</li>
 </ul>
 </li>
</ul>

]

Ridge 
 <img src="image/ridge.png" height=210px> 
 from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a>

Lasso 
 <img src="image/lasso.png" height=210px> 
 from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a>

]

---

# Regularized regression

<ul>
 <li class="m1">To fit Lasso or Ridge penalized regression in R, use <mono>method = "glmnet"</mono>.</li>
 <li class="m2">Specify the <high>type of penalty</high> and the <high>penalty weight</high> using the <mono>tuneGrid</mono> argument.</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>alpha = 1</mono>
 </td>
 <td bgcolor="white">
 Regression with Lasso penalty.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>alpha = 0</mono>
 </td>
 <td bgcolor="white">
 Regression with Ridge penalty.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>lambda</mono>
 </td>
 <td bgcolor="white">
 Regularization penalty weight.
 </td> 
</tr>
</table>

]

```r
# Train ridge regression
train(form = criterion ~ .,
      data = data_train,
      method = "glmnet",  
      trControl = ctrl,
      tuneGrid = 
        expand.grid(alpha = 0,   # Ridge 
                    lambda = 1)) # Lambda

# Train lasso regression
train(form = criterion ~ .,
      data = data_train,
      method = "glmnet",  
      trControl = ctrl,
      tuneGrid = 
        expand.grid(alpha = 1,   # Lasso 
                    lambda = 1)) # Lambda
```

]

---

<h1>Regression</h1>

<high><h1>Decision Trees</h1></high>

<h1>Random Forests</h1>

---

# Decision trees

<ul>
 <li class="m1">Decision trees have a <high>complexity parameter</high> called <high>cp</high>.</li>
</ul>

$$
\large
`\begin{split}
Loss = & Impurity\,+\\
&cp*(n\:terminal\:nodes)\\
\end{split}`
$$

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Small <mono>cp</mono>, e.g., <mono>cp<.01</mono>
 </td>
 <td bgcolor="white">
 Low penalty leading to <high>complex trees</high>.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Large <mono>cp</mono>, e.g., <mono>cp<.20</mono>
 </td>
 <td bgcolor="white">
 Large penalty leading to <high>simple trees</high>.
 </td> 
</tr>
</table>

]

]

---

# Decision trees

<ul>
 <li class="m1">Decision trees have a <high>complexity parameter</high> called <high>cp</high>.</li>
</ul>

$$
\large
`\begin{split}
Loss = & Impurity\,+\\
&cp*(n\:terminal\:nodes)\\
\end{split}`
$$

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Small <mono>cp</mono>, e.g., <mono>cp<.01</mono>
 </td>
 <td bgcolor="white">
 Low penalty leading to <high>complex trees</high>.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Large <mono>cp</mono>, e.g., <mono>cp<.20</mono>
 </td>
 <td bgcolor="white">
 Large penalty leading to <high>simple trees</high>.
 </td> 
</tr>
</table>

]

```r
# Decision tree with a defined cp = .01
train(form = income ~ .,
      data = baselers,
      method = "rpart",  # Decision Tree
      trControl = ctrl,
      tuneGrid = 
        expand.grid(cp = .01)) # cp

# Decision tree with a defined cp = .2
train(form = income ~ .,
      data = baselers,
      method = "rpart",  # Decision Tree
      trControl = ctrl,
      tuneGrid = 
        expand.grid(cp = .2)) # cp
```

]

---

<h1>Regression</h1>

<h1>Decision Trees</h1>

<high><h1>Random Forests</h1></high>

---

# Random Forest

<ul>
 <li class="m1">Random Forests have a <high>diversity parameter</high> called <mono>mtry</mono>.</li>
 <li class="m2">Technically, this controls how many features are randomly considered at each split of the trees..</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Small <mono>mtry</mono>, e.g., <mono>mtry = 1</mono>
 </td>
 <td bgcolor="white">
 <high>Diverse forest.</high> In a way, less complex.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Large <mono>mtry</mono>, e.g., <mono>mtry>5</mono>
 </td>
 <td bgcolor="white">
 <high>Similar forest.</high> In a way, more complex.
 </td> 
</tr>
</table>

]

]

---

# Random Forest

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Small <mono>mtry</mono>, e.g., <mono>mtry = 1</mono>
 </td>
 <td bgcolor="white">
 <high>Diverse forest.</high> In a way, less complex.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Large <mono>mtry</mono>, e.g., <mono>mtry>5</mono>
 </td>
 <td bgcolor="white">
 <high>Similar forest.</high> In a way, more complex.
 </td> 
</tr>
</table>

]

```r
# Random forest with a defined mtry = 2
train(form = income ~ .,
      data = baselers,
      method = "rf",  # Random forest
      trControl = ctrl,
      tuneGrid = 
        expand.grid(mtry = 2)) # mtry

# Random forest with a defined mtry = 5
train(form = income ~ .,
      data = baselers,
      method = "rf",  # Random forest
      trControl = ctrl,
      tuneGrid = 
        expand.grid(mtry = 5)) # mtry
```

]

---
class: center,  middle

# Parameter tuning with k-fold cross-validation with `caret`

---

# k-fold cross validation for Ridge and Lasso

<ul>
 <li class="m1">Goal
 
 <ul class="level">
 <li>Use 10-fold cross-validation to identify <high>optimal regularization parameters</high> for a regression model.</li>
 </ul>
 </li> 
 <li class="m2">Using
 
 <ul class="level">
 <li><mono>&alpha;	&isin; 0, .5., 1</mono> and <mono>&lambda;	&isin; 1, 2., ..., 100</mono>.</li>
 </ul>
 </li>
</ul>

]

]

---

# <mono>trainControl()</mono>

<ul>
 <li class="m1">Specify the use of k-fold cross-validation using the <mono>trainControl()</mono> function.</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Argument
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>method</mono>
 </td>
 <td bgcolor="white">
 The resampling method, use <mono>"cv"</mono> for cross validation.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>number</mono>
 </td>
 <td bgcolor="white">
 The number of folds.
 </td> 
</tr>
</table>

]

```r
# Specify 10 fold cross-validation
ctrl_cv <- trainControl(method = "cv",
 number = 10)

# Predict income using glmnet
glmnet_mod <- train(form = income ~ .,
 data = baselers,
 method = "glmnet", 
 trControl = ctrl_cv)
```

]

---

# <mono>tuneGrid</mono>

<ul>
 <li class="m1">Specify the tuning parameter values to consider using the <high><mono>tuneGrid</mono></high>.</li>
 <li class="m2"><mono>tuneGrid</mono> expects a <high>list or data frame</high> as input.</li>
 <li class="m3"><high>Parameter combinations</high> can be easily created using <mono>expand.grid</mono>.</li>
</ul>

]

```r
# Specify 10 fold cross-validation
ctrl_cv <- trainControl(method = "cv",
 number = 10)

# Predict income using glmnet
glmnet_mod <- train(form = income ~ .,
 data = baselers,
 method = "glmnet", 
 trControl = ctrl_cv,
 tuneGrid = expand.grid(
 alpha = c(0, .5, 1),
 lambda = 1:100))
```

]

---

# k-Fold Cross validation

```r
# Print summary information
glmnet_mod
```

At the end...

`RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 1 and lambda = 27.`

]

```
glmnet

1000 samples
  19 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 900, 901, 900, 901, 901, 899, ... 
Resampling results across tuning parameters:

alpha  lambda  RMSE  Rsquared  MAE  
  0.0      1     1047  0.8614    823.3
  0.0      2     1047  0.8614    823.3
  0.0      3     1047  0.8614    823.3
  0.0      4     1047  0.8614    823.3
  0.0      5     1047  0.8614    823.3
  0.0      6     1047  0.8614    823.3
  0.0      7     1047  0.8614    823.3
  0.0      8     1047  0.8614    823.3
  0.0      9     1047  0.8614    823.3
  0.0     10     1047  0.8614    823.3
  0.0     11     1047  0.8614    823.3
  0.0     12     1047  0.8614    823.3
```

]

---

# k-Fold Cross validation

```r
# Visualise tuning error curve
plot(glmnet_mod)
```

At the end...

`RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 1 and lambda = 27.`

]

]

---

# Final model

```r
# Model coefficients for best 
#   alpha and lambda
coef(glmnet_mod$finalModel,
     glmnet_mod$bestTune$lambda)
```
]

```
25 x 1 sparse Matrix of class "dgCMatrix"
                                       1
(Intercept)                     462.1958
id                                .     
sexmale                           .     
age                             116.1387
height                            1.8865
weight                            .     
educationobligatory_school        .     
educationSEK_II                   .     
educationSEK_III                  1.1857
confessionconfessionless          3.9774
confessionevangelical-reformed    .     
confessionmuslim                  .     
confessionother                   .     
children                        -21.0502
happiness                      -128.5448
fitness                           .     
food                              2.3193
alcohol                          22.2351
tattoos                         -24.8320
rhine                             0.4256
```

]

---

# Model comparison

<ul>
 <li class="m1">Compare the prediction performance of several models with <mono>resamples()</mono>.</li>
 <li class="m2">The <mono>summary()</mono> of this object will print 'prediction' error statistics from cross-validation during training. This is your <high>estimate of future prediction performance</high>!</li>
</ul>

]

```r
# Simple competitor model
glm_mod <- train(form = income ~ .,
 data = baselers, 
 method = "glm",
 trControl = ctrl_cv)

# Determine prediction statistics 
resamples_mod <- resamples(
 list(glmnet = glmnet_mod,
 glm = glm_mod))

# Print result summary
summary(resamples_mod)
```

]

---

# Model comparison

]

```

Call:
summary.resamples(object = resamples_mod)

Models: glmnet, glm 
Number of resamples: 10

MAE 
        Min. 1st Qu. Median  Mean 3rd Qu.  Max. NA's
glmnet 743.1   761.2  818.3 807.8   836.7 891.7    0
glm    734.8   777.7  801.6 812.8   844.5 892.2    0

RMSE 
        Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
glmnet 936.5   990.3   1042 1028    1076 1098    0
glm    950.7  1008.9   1016 1034    1063 1128    0

Rsquared 
         Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA's
glmnet 0.8386  0.8440 0.8582 0.8638  0.8865 0.9021    0
glm    0.8268  0.8549 0.8694 0.8624  0.8740 0.8825    0
```

]

---
class: middle, center

<h1><a href=https://therbootcamp.github.io/AML_2020AMLD/_sessions/Tuning/Tuning_practical.html>Practical</a></h1>