Tuning

# Tuning
### Applied Machine Learning with R <a href='https://therbootcamp.github.io'> The R Bootcamp @ AMLD </a> <a href='https://therbootcamp.github.io/AML_2021AMLD/'> </a>  <a href='https://therbootcamp.github.io'> </a>  <a href='mailto:therbootcamp@gmail.com'> </a>  <a href='https://www.linkedin.com/company/basel-r-bootcamp/'> </a>
### November 2021

---

<div class="my-footer">
 
 
 <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/>
 
 <a href="https://therbootcamp.github.io/">
 
 
 www.therbootcamp.com
 
 
 </a>
 <a href="https://therbootcamp.github.io/">
 
 Applied Machine Learning with R @ AMLD | November 2021
 
 </a>
 
 </div>

---

<ul>
 <li class="m1">When a model <high>fits the training data too well</high> on the expense of its performance in prediction, this is called overfitting.</li>
 <li class="m2">Just because model A is better than model B in training, does not mean it will be better in testing! Extremely flexible models are <high>'wolves in sheep's clothing'</high>.</li>
 <li class="m3">But is there nothing we can do?.</li>
</ul>

]

<img src="image/wolf_complex.png"> 
adapted from <a href="">victoriarollison.com</a>

]

---

# Tuning parameters

<ul>
 <li class="m1">Machine learning models are equipped with tuning parameters that <high> control model complexity<high>.</li> 
 <li class="m2">These tuning parameters can be identified using a <high>validation set</high> created from the traning data.</li> 
 <li class="m3">Algorithm:
 
 <ul class="level">
 <li>1 - Create separate validation set.</li>
 <li>2 - Fit model using various tuning parameters.</li>
 <li>3 - Select tuning leading to best prediction on validation set.</li>
 <li>4 - Refit model to entire training set (training + validation).</li>
 </ul>
 </li>
</ul>

]

]

---

# Resampling methods

<ul>
 <li class="m1">Resampling methods automatize and generalize model tuning.</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
 <col width="30%">
 <col width="70%">
<tr>
 <td bgcolor="white">
 Method
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 k-fold cross-validation
 </td>
 <td bgcolor="white">
 Splits the data in k-pieces, use <high>each piece once</high> as the validation set, while using the other one for training. 
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Bootstrap
 </td>
 <td bgcolor="white">
 For B bootstrap rounds <high>sample</high> from the data <high>with replacement</high> and split the data in training and validation set. 
 </td> 
</tr>
</table>
]

]

---

# Resampling methods

<ul>
 <li class="m1">Resampling methods automatize and generalize model tuning.</li>
</ul>

]

---

# Resampling methods

<ul>
 <li class="m1">Resampling methods automatize and generalize model tuning.</li>
</ul>

]

---

<high><h1>Regression</h1></high>

<h1>Decision Trees</h1>

<h1>Random Forests</h1>

---

# Regularized regression

<ul>
 <li class="m1">Penalizes regression loss for having large &beta; values using the <high>lambda &lambda; tuning parameter</high> and one of several penalty functions.</li>
</ul>

$$Regularized \;loss = \sum_i^n (y_i-\hat{y}_i)^2+\lambda \sum_j^p f(\beta_j)) $$
<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Name
 </td>
 <td bgcolor="white">
 Function
 </td> 
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Lasso
 </td>
 <td bgcolor="white">
 |&beta;j|
 </td> 
 <td bgcolor="white">
 Penalize by the <high>absolute</high> regression weights.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Ridge 
 </td>
 <td bgcolor="white">
 &beta;j2
 </td> 
 <td bgcolor="white">
 Penalize by the <high>squared</high> regression weights.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Elastic net
 </td>
 <td bgcolor="white">
 |&beta;j| + &beta;j2
 </td> 
 <td bgcolor="white">
 Penalize by Lasso and Ridge penalties.
 </td> 
</tr>
</table>

]

<img src="image/bonsai.png"> 
from <a href="https://www.mallorcazeitung.es/leben/2018/05/02/bonsai-liebhaber-mallorca-kunst-lebenden/59437.html">mallorcazeitung.es</a>

]

---

# Regularized regression

<ul>
 <li class="m1">Ridge
 
 <ul class="level">
 <li>By penalizing the most extreme &beta;s most strongly, Ridge leads to (relatively) more <high>uniform &beta;s</high>.</li>
 </ul>
 </li> 
 <li class="m2">Lasso
 
 <ul class="level">
 <li>By penalizing all &beta;s equally, irrespective of magnitude, Lasso drives some &beta;s to 0 resulting effectively in <high>automatic feature selection</high>.</li>
 </ul>
 </li>
</ul>

]

Ridge 
 <img src="image/ridge.png" height=210px> 
 from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a>

Lasso 
 <img src="image/lasso.png" height=210px> 
 from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a>

]

---

# Regularized regression

<ul>
 <li class="m1">To fit Lasso or Ridge penalized regression in R, use the <mono>glmnet</mono> engine.</li>
 <li class="m2">Specify the <high>type of penalty</high> and the <high>penalty weight</high>in the model definition.</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>mixture = 1</mono>
 </td>
 <td bgcolor="white">
 Regression with Lasso penalty.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>mixture = 0</mono>
 </td>
 <td bgcolor="white">
 Regression with Ridge penalty.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>penalty</mono>
 </td>
 <td bgcolor="white">
 Regularization penalty weight.
 </td> 
</tr>
</table>

]

```r
# Ridge regression model
ridge_model <- 
 linear_reg(mixture = 0,
 penalty = 1) %>% 
 set_engine("glmnet") %>% 
 set_mode("regression")

# Lasso regression model
lasso_model <- 
 linear_reg(mixture = 1,
 penalty = 1) %>% 
 set_engine("glmnet") %>% 
 set_mode("regression")
```

]

---

<h1>Regression</h1>

<high><h1>Decision Trees</h1></high>

<h1>Random Forests</h1>

---

# Decision trees

<ul>
 <li class="m1">Decision trees have a <high>complexity parameter</high> called <high>cp</high>.</li>
</ul>

$$
\large
`\begin{split}
Loss = & Impurity\,+\\
&cp*(n\:terminal\:nodes)\\
\end{split}`
$$

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Small <mono>cp</mono>, e.g., <mono>cp<.01</mono>
 </td>
 <td bgcolor="white">
 Low penalty leading to <high>complex trees</high>.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Large <mono>cp</mono>, e.g., <mono>cp<.20</mono>
 </td>
 <td bgcolor="white">
 Large penalty leading to <high>simple trees</high>.
 </td> 
</tr>
</table>

]

]

---

# Decision trees

<ul>
 <li class="m1">Decision trees have a <high>complexity parameter</high> called <high>cost_complexity</high> (or, often, <high>cp</high>).</li>
</ul>

$$
\large
`\begin{split}
Loss = & Impurity\,+\\
&cp*(n\:terminal\:nodes)\\
\end{split}`
$$

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Small <mono>cost_complexity</mono>, e.g., <mono>cp<.01</mono>
 </td>
 <td bgcolor="white">
 Low penalty leading to <high>complex trees</high>.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Large <mono>cost_complexity</mono>, e.g., <mono>cp<.20</mono>
 </td>
 <td bgcolor="white">
 Large penalty leading to <high>simple trees</high>.
 </td> 
</tr>
</table>

]

```r
# Decision tree with a defined cp = .01
dt_model <- 
 decision_tree(cost_complexity = .1) %>% 
 set_engine("rpart") %>% 
 set_mode("regression")

# Decision tree with a defined cp = .2
dt_model <- 
 decision_tree(cost_complexity = .2) %>% 
 set_engine("rpart") %>% 
 set_mode("regression")
```

]

---

<h1>Regression</h1>

<h1>Decision Trees</h1>

<high><h1>Random Forests</h1></high>

---

# Random Forest

<ul>
 <li class="m1">Random Forests have a <high>diversity parameter</high> called <mono>mtry</mono>.</li>
 <li class="m2">Technically, this controls how many features are randomly considered at each split of the trees..</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Small <mono>mtry</mono>, e.g., <mono>mtry = 1</mono>
 </td>
 <td bgcolor="white">
 <high>Diverse forest.</high> In a way, less complex.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Large <mono>mtry</mono>, e.g., <mono>mtry>5</mono>
 </td>
 <td bgcolor="white">
 <high>Similar forest.</high> In a way, more complex.
 </td> 
</tr>
</table>

]

]

---

# Random Forest

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Small <mono>mtry</mono>, e.g., <mono>mtry = 1</mono>
 </td>
 <td bgcolor="white">
 <high>Diverse forest.</high> In a way, less complex.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Large <mono>mtry</mono>, e.g., <mono>mtry>5</mono>
 </td>
 <td bgcolor="white">
 <high>Similar forest.</high> In a way, more complex.
 </td> 
</tr>
</table>

]

```r
# Random forest with a defined mtry = 2
rf_model <-
 rand_forest(mtry = 2) %>% 
 set_engine("ranger") %>% 
 set_mode("regression")

# Random forest with a defined mtry = 5
rf_model <-
 rand_forest(mtry = 5) %>% 
 set_engine("ranger") %>% 
 set_mode("regression")
```

]

---

<img src="https://www.tidymodels.org/images/tidymodels.png" width=240px> 
from <a href="https://www.tidymodels.org/packages/">tidymodels.org</a>

---

# Fitting <mono>tidymodels</mono>

<ul>
 <li class="m1">Specify resampling</li> 
 <li class="m2">Set up model tuning</li> 
 <li class="m3">Define grid</li> 
 <li class="m4">Tune model</li> 
 <li class="m5">Select best model</li> 
 <li class="m6">Retrain and evaluate</li> 
</ul>

]

]

---

# v-fold cross validation

<ul>
 <li class="m1">Goal
 
 <ul class="level">
 <li>Use 10-fold cross-validation to identify <high>optimal regularization parameters</high> for a regression model.</li>
 </ul>
 </li> 
 <li class="m2">Using
 
 <ul class="level">
 <li><mono>&alpha;	&isin; {0, .5, 1}</mono></li>
 <li><mono>&lambda;	&isin; {1, 2, ..., 100}</mono></li>
 </ul>
 </li>
</ul>

]

]

---

# Specify resampling

<ul>
 <li class="m1">Specify the use of v-fold cross-validation using the <mono>vfold_cv()</mono> function.</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Argument
 </td>
 <td bgcolor="white">
 Description
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>data</mono>
 </td>
 <td bgcolor="white">
 The training data.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>v</mono>
 </td>
 <td bgcolor="white">
 The number of folds.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>repeats</mono>
 </td>
 <td bgcolor="white">
 The number of repeats
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>strata</mono>
 </td>
 <td bgcolor="white">
 A stratification variable.
 </td> 
</tr>
</table>

]

```r
# split data
baselers_split <- initial_split(baselers)
baselers_train <- training(baselers_split)
baselers_test <- testing(baselers_split)

# specify 10 fold cross-validation
baselers_folds <- vfold_cv(baselers_train,
 v = 10)
```

]

---

# Set up model tuning

<ul>
 <li class="m1">Parameters that should be tuned have to be specified in the model definition.</li>
 <li class="m2">To do so, set the respective parameter argument to <mono>tune()</mono>.</li>
</ul>

]

```r
# define recipe
recipe <- 
 recipe(income ~ .,
 data = baselers_train) %>%
 step_dummy(all_nominal_predictors()) %>% 
 step_normalize(all_numeric_predictors())

# glmnet where mixture and penalty is tuned
glmnet_model <- 
 linear_reg(mixture = tune(),
 penalty = tune()) %>% 
 set_engine("glmnet") %>% 
 set_mode("regression")

# define workflow
glmnet_workflow <- 
 workflow() %>% 
 add_recipe(recipe) %>% 
 add_model(glmnet_model)
```

]

---

# Define grid

<ul>
 <li class="m1">Two ways to specify a grid of parameter values:</li>
<ul class="levels">
 <li>Use <mono>grid_regular</mono> and similar functions automatically choose reasonable values.</li>
 <li>Specify a <mono>tibble</mono> to define specific ranges and combinations.</li>
</ul>
</ul>
]

```r
# using tidymodels to generate values
parameter_grid <- grid_regular(mixture(),
 penalty(),
 levels = 50)

# determine values yourself
parameter_grid <-
 crossing(mixture = c(0, .5, 1),
 penalty = 1:100)
```

]

---

# Tune parameters with <mono>tune_grid()</mono>

<ul>
 <li class="m1">Supply <mono>tune_grid()</mono> with:</li>
<ul class="levels">
 <li>The <mono>workflow</mono></li>
 <li>The resampling data</li>
 <li>The parameter grid</li>
</ul>
</ul>

]

```r
# tune parameters using 10 fold CV
glmnet_grid <- 
 tune_grid(glmnet_workflow,
 resamples = baselers_folds,
 grid = parameter_grid)

# show output
glmnet_grid
```

```
# Tuning results
# 10-fold cross-validation 
# A tibble: 10 x 4
 splits id .metrics .notes 
 <list> <chr> <list> <list> 
1 <split [675/75]> Fold01 <tibble ~ <tibbl~
2 <split [675/75]> Fold02 <tibble ~ <tibbl~
3 <split [675/75]> Fold03 <tibble ~ <tibbl~
4 <split [675/75]> Fold04 <tibble ~ <tibbl~
5 <split [675/75]> Fold05 <tibble ~ <tibbl~
6 <split [675/75]> Fold06 <tibble ~ <tibbl~
7 <split [675/75]> Fold07 <tibble ~ <tibbl~
# ... with 3 more rows
```

]

---

# Select best tuning parameter

<ul>
 <li class="m1"><mono>tune_grid()</mono> returns fit values of the models with the different hyper-parameter values.</li> 
 <li class="m2"><mono>select_best()</mono> selects the best tuning-parameter values.</li> 
 <li class="m3"><mono>finalize_workflow()</mono> sets the workflow to the best tuning parameters.</li> 
</ul>

]

```r
# extract best
best_glmnet <- select_best(glmnet_grid)

# show best model
best_glmnet
```

```
# A tibble: 1 x 3
 penalty mixture .config 
 <int> <dbl> <chr> 
1 38 1 Preprocessor1_Model238
```

```r
# set best tuning parameters
final_glmnet <-
 finalize_workflow(glmnet_workflow,
 best_glmnet)
```

]

---

# Retrain and evaluate

<ul>
 <li class="m1">The finalized model should be <high>retrained</high> to the training data.</li>
 <li class="m2">The retrained model can then be <high>evaluated on the test data</high>.</li>
</ul>

]

```r
# retrain model
final_glmnet_res <- fit(final_glmnet,
 baselers_train)

# evaluate prediction
final_glmnet_pred <- 
 final_glmnet_res %>%
 predict(baselers_test) %>% 
 bind_cols(baselers_test %>%select(income))

# show metrics
metrics(final_glmnet_pred, truth = income,
        estimate = .pred)
```

```
# A tibble: 3 x 3
 .metric .estimator .estimate
 <chr> <chr> <dbl>
1 rmse standard 978. 
2 rsq standard 0.886
3 mae standard 785. 
```

]

---
class: middle, center

<h1><a href=https://therbootcamp.github.io/AML_2021AMLD/_sessions/Tuning/Tuning_practical.html>Practical</a></h1>