Optimizing

# Optimizing
### Maschinelles Lernen mit R <a href='https://therbootcamp.github.io'> The R Bootcamp </a> <a href='https://therbootcamp.github.io/ML_2020Oct/'> </a>  <a href='https://therbootcamp.github.io'> </a>  <a href='mailto:therbootcamp@gmail.com'> </a>  <a href='https://www.linkedin.com/company/basel-r-bootcamp/'> </a>
### Oktober 2020

---

<div class="my-footer">
 
 
 <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/>
 
 <a href="https://therbootcamp.github.io/">
 
 
 www.therbootcamp.com
 
 
 </a>
 <a href="https://therbootcamp.github.io/">
 
 Maschinelles Lernen mit R | Oktober 2020
 
 </a>
 
 </div>

---

<ul>
 <li class="m1">Overfitting tritt ein, wenn ein Modell die <high>Daten zu genau fitted</high> und deswegen <high>keine guten Vorhersagen</high> liefert</li> 
 <li class="m2">Gute Performanz im Training bedeutet also nicht unbedingt <high>gute Performanz im Test</high>.</li>
</ul>

]

<img src="image/wolf_complex.png"> 
adapted from <a href="">victoriarollison.com</a>

]

---

# Tuning durch Validation-Daten

<ul>
 <li class="m1">Die meisten ML Modelle besitzen Tuning Parameter, die die <high>Modellkomplexität</high> kontrollieren</li> 
 <li class="m2">Um diese Tuning Parameter zu fitten wird ein <high>Validationsdatensatz</high> kreiert.</li> 
 <li class="m3">Vorgehen</li> 
 <ol>
 <li>Fitte Modell mit <high>verschiedenen Tuning Parameter</high> Werte</li>
 <li>Auf Basis des <high>Validationsdatensatzes</high> wähle die besten Tuning Parameter</li>
 <li>Fitte Modell für <high>gesamten Traningsdatensatz</high>.</li>
 </ol>
</ul>

]

]

---

# Resampling Methoden

<ul>
 <li class="m1">Resampling-Methoden <high>automatisieren</high> und generalisieren das Tuning der Modelle.</li> 
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
 <col width="30%">
 <col width="70%">
<tr>
 <td bgcolor="white">
 Methode
 </td>
 <td bgcolor="white">
 Beschreibung
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 k-fold cross-validation
 </td>
 <td bgcolor="white">
 Trennt die Daten in k-Teile, verwendet <high>jeden Teil einmal</high> als Validationsset, während die restlichen Teile k-1 als Trainingsset dienen. 
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Bootstrap
 </td>
 <td bgcolor="white">
 Über B Bootstrap Runden ziehe <high>Zufallsstrichproben mit Zurücklegen</high> aus den Daten und trenne die Stichprobe in Training und Validation auf.
 </td> 
</tr>
</table>
]

]

---

# Resampling Methoden

<ul>
 <li class="m1">Resampling-Methoden <high>automatisieren</high> und generalisieren das Tuning der Modelle</li> 
</ul>

]

---

# Resampling Methoden

<ul>
 <li class="m1">Resampling-Methoden <high>automatisieren</high> und generalisieren das Tuning der Modelle</li> 
</ul>

]

---

<high><h1>Decision Trees</h1></high>

<h1>Random Forests</h1>

<h1>Regression</h1>

---

# Decision trees

<ul>
 <li class="m1">Der Tuning Parameter in Decision Trees heisst <mono>cp</mono> (<high>complexity parameter</high>).</li>
 </ul>
</ul>

$$
\large
`\begin{split}
Loss = & Impurity\,+\\
&cp*(n\:terminal\:nodes)\\
\end{split}`
$$

<mono>tuneGrid</mono> Einstellungen

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Beschreibung
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Niedriger <mono>cp</mono>, z.B. <mono>cp<.01</mono>
 </td>
 <td bgcolor="white">
 Niedrige Strafe, die zu <high>komplexen Bäumen</high> führt.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Hoher <mono>cp</mono>, z.B. <mono>cp<.20</mono>
 </td>
 <td bgcolor="white">
 Hohe Strafe, die zu <high>einfachen Bäumen</high> führt.
 </td> 
</tr>
</table>

]

]

---

# Decision trees

<ul>
 <li class="m1">Der Tuning Parameter in Decision Trees heisst <mono>cp</mono> (<high>complexity parameter</high>).</li>
 </ul>
</ul>

$$
\large
`\begin{split}
Loss = & Impurity\,+\\
&cp*(n\:terminal\:nodes)\\
\end{split}`
$$

<mono>tuneGrid</mono> Einstellungen

]

```r
# Decision Tree mit cp = .01
train(form = einkommen ~ .,
      data = basel,
      method = "rpart", 
      trControl = ctrl,
      tuneGrid = 
        expand.grid(cp = .01))

# Decision Tree mit cp = .2
train(form = einkommen ~ .,
      data = basel,
      method = "rpart", 
      trControl = ctrl,
      tuneGrid = 
        expand.grid(cp = .2))
```

]

---

<h1>Decision Trees</h1>

<high><h1>Random Forests</h1></high>

<h1>Regression</h1>

---

# Random Forest

<ul>
 <li class="m1">Der Tuning Parameter in Random Trees heisst <mono>mtry</mono> und kontrolliert die <high>Diversität</high>.</li>
 <li class="m2"><mono>mtry</mono> bestimmt <high>wie viele Feature</high> für den Split eines Knoten herangezogen werden.</li>
 </ul>
</ul>

<mono>tuneGrid</mono> Einstellungen

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Beschreibung
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Niedriges <mono>mtry</mono>, z.B., <mono>mtry = 1</mono>
 </td>
 <td bgcolor="white">
 <high>Diverser Wald.</high> Auf eine Weise, weniger komplex.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Hohes <mono>mtry</mono>, z.B., <mono>mtry = 5</mono>
 </td>
 <td bgcolor="white">
 <high>Monotoner Wald.</high> Auf eine Weise, komplexer.
 </td> 
</tr>
</table>

]

]

---

# Random Forest

<mono>tuneGrid</mono> Einstellungen

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Beschreibung
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Niedriges <mono>mtry</mono>, z.B., <mono>mtry = 1</mono>
 </td>
 <td bgcolor="white">
 <high>Diverser Wald.</high> Auf eine Weise, weniger komplex.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Hohes <mono>mtry</mono>, z.B., <mono>mtry = 5</mono>
 </td>
 <td bgcolor="white">
 <high>Monotoner Wald.</high> Auf eine Weise, komplexer.
 </td> 
</tr>
</table>

]

```r
# Random forest mit mtry = 2
train(form = einkommen ~ .,
      data = basel,
      method = "rf",  
      trControl = ctrl,
      tuneGrid = 
        expand.grid(mtry = 2))

# Random forest mit mtry = 5
train(form = einkommen ~ .,
      data = basel,
      method = "rf",  
      trControl = ctrl,
      tuneGrid = 
        expand.grid(mtry = 5)) 
```

]

---

<h1>Decision Trees</h1>

<h1>Random Forests</h1>

<high><h1>Regression</h1></high>

---

# Regularized regression

<ul>
 <li class="m1">Bestraft proportional zum <high>Tuning parameter &lambda;</high> den Loss einer Regression für die Grösse der Modellparameter</li>
</ul>

$$Regularized \;loss = \sum_i^n (y_i-\hat{y}_i)^2+\lambda \sum_j^p f(\beta_j)) $$
<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Name
 </td>
 <td bgcolor="white">
 Funktion
 </td> 
 <td bgcolor="white">
 Strafe
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Lasso
 </td>
 <td bgcolor="white">
 |&beta;j|
 </td> 
 <td bgcolor="white">
 Proportional zu den <high>absoluten</high> Regressionsgewichten.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Ridge 
 </td>
 <td bgcolor="white">
 &beta;j2
 </td> 
 <td bgcolor="white">
 Proportional zu den <high>quadrierten</high> Regressionsgewichten.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 Elastic net
 </td>
 <td bgcolor="white">
 |&beta;j| + &beta;j2
 </td> 
 <td bgcolor="white">
 Summe von Lasso und Ridge.
 </td> 
</tr>
</table>

]

<img src="image/bonsai.png"> 
from <a href="https://www.mallorcazeitung.es/leben/2018/05/02/bonsai-liebhaber-mallorca-kunst-lebenden/59437.html">mallorcazeitung.es</a>

]

---

# Regularized regression

<ul>
 <li class="m1">Ridge und Lasso verhalten sich <high>erstaunlich unterschiedlich</high></li> 
 <li class="m2">Ridge</li> 
 <ul>
 <li>Durch die Quadrierung werden vor allem extreme &beta;s in ihrer Grösse reduziert.</li> 
 </ul>
 <li class="m3">Lasso</li> 
 <ul>
 <li>Bei absoluten Werte werden alle &beta;s gleichermassen in ihrer Grösse reduziert, was zu einer automatischen Feature-Auswahl führt, wenn einige &beta;s Null werden.</li> 
 </ul>
</ul>

]

Ridge 
 <img src="image/ridge.png" height=210px> 
 from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a>

Lasso 
 <img src="image/lasso.png" height=210px> 
 from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a>

]

---

# Regularized regression

<ul>
 <li class="m1">Verwende <mono>method = "glmnet"</mono> für Lasso und Ridge</high></li> 
 <li class="m2">Die Art der Regularisierung wird über das <highm>tuneGrid</highm> Argument spezifiziert</li> 
 </ul>
</ul>

<mono>tuneGrid</mono> Einstellungen

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Parameter
 </td>
 <td bgcolor="white">
 Beschreibung
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>alpha = 1</mono>
 </td>
 <td bgcolor="white">
 Regression mit Lasso Regularisierung.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>alpha = 0</mono>
 </td>
 <td bgcolor="white">
 Regression mut Ridge Regularisierung.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>lambda</mono>
 </td>
 <td bgcolor="white">
 Gewicht der Regularisierung.
 </td> 
</tr>
</table>

]

```r
# Trainiere ridge regression
train(form = einkommen ~ .,
      data = basel,
      method = "glmnet",  
      trControl = ctrl,
      tuneGrid = 
        expand.grid(alpha = 0,   # Ridge 
                    lambda = 1)) # Lambda

# Trainiere lasso regression
train(form = einkommen ~ .,
      data = basel,
      method = "glmnet",  
      trControl = ctrl,
      tuneGrid = 
        expand.grid(alpha = 1,   # Lasso 
                    lambda = 1)) # Lambda
```

]

---

<h1><a>Parameter Tuning mit 10-fold CV in <mono>caret</mono></h1>

---

# k-fache Cross Validation für Ridge und Lasso

<ul>
 <li class="m1">Ziel:</li> 
 <ul>
 <li>Identifiziere durch 10-fache Cross Validation die <high>besten Regularisierungsparameter</high> für ein Regressionsmodell</li> 
 </ul>
 <li class="m2">Berücksichtige:</li> 
 <ul>
 <li>&alpha; &in; 0, .5, 1</li>
 <li>&lambda; &in; 1, 2, ..., 100</li>
 </ul>
</ul>

]

]

---

# <mono>trainControl()</mono>

<ul>
 <li class="m1">Spezifiziere über <mono>trainControl</mono> die Art des Resamplings</li> 
</ul>

<mono>trainControl() Argumente</mono>

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td bgcolor="white">
 Argument
 </td>
 <td bgcolor="white">
 Beschreibung
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>method</mono>
 </td>
 <td bgcolor="white">
 Die Resampling Methode; verwende `"cv"` für Cross Validation.
 </td> 
</tr>
<tr>
 <td bgcolor="white">
 <mono>number</mono>
 </td>
 <td bgcolor="white">
 Die Anzahl der "Folds".
 </td> 
</tr>
</table>

]

```r
# Spezifiziere 10-fache Cross Validation
ctrl_cv <- trainControl(method = "cv",
 number = 10)

# Prädiziere einkommen mit glmnet
glmnet_mod <- train(form = einkommen ~ .,
 data = basel,
 method = "glmnet", 
 trControl = ctrl_cv)
```

]

---

# <mono>tuneGrid</mono>

<ul>
 <li class="m1">Spezifiziere über <mono>tuneGrid</mono> die <high>Kandidatensets</high> für die Tuning Parameter.</li> 
 <li class="m2"><mono>tuneGrid</mono> eine <mono>liste</mono> oder einen <mono>data.frame</mono>; komfortabel durch <mono>expand.grid()</mono> erstellt.</li> 
</ul>

]

```r
# Spezifiziere 10-fache Cross Validation
ctrl_cv <- trainControl(method = "cv",
 number = 10)

# Prädiziere einkommen mit glmnet
glmnet_mod <- train(form = einkommen ~ .,
 data = basel,
 method = "glmnet", 
 trControl = ctrl_cv,
 tuneGrid = expand.grid(
 alpha = c(0, .5, 1),
 lambda = 1:100))
```

]

---

# k-fold Cross Validation

```r
# Printe Überblick
glmnet_mod
```

At the end...

`RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 1 and lambda = 27.`

]

```
glmnet

6120 samples
  19 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 5506, 5507, 5508, 5509, 5507, 5509, ... 
Resampling results across tuning parameters:

alpha  lambda  RMSE  Rsquared  MAE  
  0.0      1     1054  0.8717    844.0
  0.0      2     1054  0.8717    844.0
  0.0      3     1054  0.8717    844.0
  0.0      4     1054  0.8717    844.0
  0.0      5     1054  0.8717    844.0
  0.0      6     1054  0.8717    844.0
  0.0      7     1054  0.8717    844.0
  0.0      8     1054  0.8717    844.0
  0.0      9     1054  0.8717    844.0
  0.0     10     1054  0.8717    844.0
  0.0     11     1054  0.8717    844.0
  0.0     12     1054  0.8717    844.0
```

]

---

# k-fold Cross Validation

```r
# Visualisiere Tuningparameter Fehlerkurve
plot(glmnet_mod)
```

At the end...

`RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 1 and lambda = 27.`

]

]

---

# Final model

```r
# Modellparameter unter dem besten Werten
# für alpha und lambda
coef(glmnet_mod$finalModel,
     glmnet_mod$bestTune$lambda)
```
]

```
25 x 1 sparse Matrix of class "dgCMatrix"
                                         1
(Intercept)                       873.7053
id                                  .     
geschlechtm                         .     
alter                             108.5944
groesse                             1.6993
gewicht                            -1.1578
bildungobligatorisch              -43.2194
bildungsek II                     -77.9362
bildungsek III                    -20.9463
konfessionevangelisch-reformiert   -7.8896
konfessionkatholisch                .     
konfessionkonfessionslos           45.1148
konfessionmuslimisch              103.6467
kinder                              2.3714
glueck                           -209.4458
fitness                            26.3446
essen                               2.7141
alkohol                            25.9812
tattoos                           -16.4525
rhein                               3.6709
datause                             0.1059
arztbesuche                        -1.5933
wandern                            -0.1534
fasnachtnein                        .     
sehhilfenein                       -0.1204
```

]

<!---

# Model comparison

<ul>
 <li class="m1">Vergleiche die <high>Performanz für die Validationsets</high> mit <mono>resamples()</mono></li> 
 <li class="m2">Die <mono>summary()</mono> des Objekts gibt einen ausführlichen <high>Überblick</high> über die Performanz.</li> 
</ul>

]

```r
# Einfaches Modell
glm_mod <- train(form = einkommen ~ .,
 data = basel, 
 method = "glm",
 trControl = ctrl_cv)

# Berechne Performanzen 
resamples_mod <- resamples(
 list(glmnet = glmnet_mod,
 glm = glm_mod))

# Zeige Überblick
summary(resamples_mod)
```

]

# Modellvergleich

Vergleiche die Vorhersageperformanz mehrerer Modelle mit `resamples()`.

Das `summary()` des Outputobjekts printet Vorhersage Fehlerstatistiken der Cross Validation während des Trainings. Das ist eine <high>Schätzung der zukünftigen Vorhersageperformanz</high>!

]

```

Call:
summary.resamples(object = resamples_mod)

Models: glmnet, glm 
Number of resamples: 10

MAE 
        Min. 1st Qu. Median  Mean 3rd Qu.  Max. NA's
glmnet 769.4   827.1  836.9 832.1   843.9 880.6    0
glm    788.2   813.2  840.3 832.4   849.0 865.0    0

RMSE 
        Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
glmnet 957.1    1025   1050 1040    1054 1103    0
glm    988.2    1017   1044 1040    1064 1085    0

Rsquared 
         Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA's
glmnet 0.8601  0.8661 0.8720 0.8721  0.8774 0.8854    0
glm    0.8558  0.8647 0.8736 0.8719  0.8768 0.8847    0
```

]

--->

---
class: middle, center

<h1><a href=https://therbootcamp.github.io/ML_2020Oct/_sessions/Optimization/Optimization_practical.html>Practical</a></h1>