Fitting

# Fitting
### Maschinelles Lernen mit R <a href='https://therbootcamp.github.io'> The R Bootcamp </a> <a href='https://therbootcamp.github.io/ML_2020Apr/'> </a>  <a href='https://therbootcamp.github.io'> </a>  <a href='mailto:therbootcamp@gmail.com'> </a>  <a href='https://www.linkedin.com/company/basel-r-bootcamp/'> </a>
### April 2020

---

<div class="my-footer">
 
 
 <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/>
 
 <a href="https://therbootcamp.github.io/">
 
 
 www.therbootcamp.com
 
 
 </a>
 <a href="https://therbootcamp.github.io/">
 
 Maschinelles Lernen mit R | April 2020
 
 </a>
 
 </div>

---

# Fitting

<ul>
<li class="m1">Modelle sind eigentlich <high>Familie von Modellen</high>, wobei jede Parameterkombination ein unterschiedliches Modell definiert</li>
<li class="m2">Ein Modell zu fitten bedeutet, von der Familie von Modellen dasjenige zu <high>identifizieren welches die Daten am besten abbildet</high>.</li>
</ul>

]

<img src="image/curvefits.png" height=480px> 
angepasst von <a href="https://www.explainxkcd.com/wiki/index.php/2048:_Curve-Fitting">explainxkcd.com</a>

]

---

# Welches Modelle ist besser?

---

# Welches Modelle ist besser?

---

# Loss function

<ul>
<li class="m1">Eines <high>zentrales Konzepte</high> in der Statistik und im maschinellen Lernen.</li>
<li class="m2">Die Loss Funktion ist eine <high>Zusammenfassung der durch ein Modell begangenen Fehler</high>.</li>
</ul>

`$$\Large Loss = f(Fehler)$$`

Zwei Zwecke

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td>
 Zweck
 </td>
 <td>
 Beschreibung
 </td>
</tr>
<tr>
 <td bgcolor="white">
 Fitting
 </td>
 <td bgcolor="white">
 Finde Parameter, die die Verlustfunktion minimieren.
 </td>
</tr>
<tr>
 <td>
 Evaluation
 </td>
 <td>
 Berechne den Verlust für ein gefittetes Modell.
 </td>
</tr>
</table>

]

]

---

<high><h1>Regression</h1></high>

<h1>Decision Trees</h1>

<h1>Random Forests</h1>

---

# Regression

In der [Regression](https://de.wikipedia.org/wiki/Regressionsanalyse), wird ein <high>Kriterium</high> `$y$` modelliert, als <high>Summe</high> der <high>Features</high> `$x_1, x_2, ...$` <high>mal Gewichte</high> `$b_1, b_2, ...$` plus `$b_0$`, der sogenannte Intercept oder Ordinatenabschnitt.

`$$\large \hat{y} =  b_{0} + b_{1} \times x_1 + b_{2} \times x2 + ...$$`

Ein Regressionskoeffizient `$b_{i}$` gibt an, wie stark sich `$\hat{y}$` <high>verändert</high>, wenn sich `$x_{i}$` um 1 verändert.

Ceteris paribus, je <high>extremer</high> `$b_{i}$`, desto <high>wichtiger</high> ist `$x_{i}$` für die Vorhersage von `$y$` (Cave: die Skala von `$x_{i}$` beeinflusst `$b_i$`!).

Wenn `$b_{i} = 0$`, heisst das, `$x_{i}$` <high>bringt keinen Zusatznutzen</high> bei der Vorhersage von `$y$`.

]

]

---

# Loss function in der Regression

<ul style="margin-bottom:-20px">
 <li class="m1">Mean Squared Error (MSE)
 
 <ul class="level">
 <li><high>Mittlere Quadratsumme der Abweichungen</high> zwischen vorhergesagten und tatsächlichen Werten.</li>
 </ul>
 </li>
</ul>

$$ MSE = \frac{1}{n}\sum_{i \in 1,...,n}(y_{i} - \hat{y}_{i})^{2}$$

<ul>
 <li class="m2">Mean Absolute Error (MAE)
 
 <ul class="level">
 <li><high>Mittlere absolute Abweichungen</high> zwischen vorhergesagten und tatsächlichen Werten.</li>
 </ul>
 </li>
</ul>

$$ MAE = \frac{1}{n}\sum_{i \in 1,...,n} \lvert y_{i} - \hat{y}_{i} \rvert$$

]

]

---

<ul style="margin-bottom:-20px">
 <li class="m1">Analytisch
 
 <ul class="level">
 <li>In gewissen Fällen, können die Parameterwerte <high>direkt berechnet</high> werden, z.B., mit der Normalgleichung:</li>
 </ul>
 </li>
</ul>
 
$$ \large \boldsymbol b = (\boldsymbol X^T\boldsymbol X)^{-1}\boldsymbol X^T\boldsymbol y$$

<ul>
 <li class="m2">Numerisch
 
 <ul class="level">
 <li>In den meisten Fällen, müssen die Parameter jedoch mittels <high>gerichtetem trial and error</high> Verfahren gefunden werden, z.B., mittels gradient descent:</li>
 </ul>
 </li>
</ul>
 
$$ \large \boldsymbol b_{n+1} = \boldsymbol b_{n}+\gamma \nabla F(\boldsymbol b_{n})$$

]

.pull-right45[
 

<img src="image/gradient.png" height=420px> 
angepasst von <a href="https://me.me/i/machine-learning-gradient-descent-machine-learning-machine-learning-behind-the-ea8fe9fc64054eda89232d7ffc9ba60e">me.me</a>

]

---

]

<br2>

<img src="image/gradient1.gif" height=250px> 
angepasst von <a href="https://dunglai.github.io/2017/12/21/gradient-descent/
">dunglai.github.io</a> 
<img src="image/gradient2.gif" height=250px> 
angepasst von <a href="https://dunglai.github.io/2017/12/21/gradient-descent/
">dunglai.github.io</a>

]

---

# Zwei Problemtypen

<ul>
 <li class="m1">Regression</li>
 
 <ul class="level">
 <li>Vorhersage eines <high>numerischen, kontinuierlichen Kriteriums</high>.</li> 
 <li>Vorhersage des Cholesterinspiegels mit Alter</li>
 </ul> 
 <li class="m2">Klassifikation</li>
 
 <ul class="level">
 <li>Vorhersage eines <high>kategorialen, diskreten Kriteriums</high>.</li> 
 <li>Vorhersage, ob Herzinfarkt ja oder nein</li>
 </ul> 
</ul>

]

]

---

# Logistische Regression

<ul style="margin-bottom:-20px">
 <li class="m1">In der <a href="https://de.wikipedia.org/wiki/Logistische_Regression">logistischen Regression</a>, modellieren wir eine kategoriale Variable <mono>y &isin; (0,1)</mono> als <high>gewichtete Summe der Features</high>, wobei wir die Vorhersage mit einer <high>logistischen Linkfunktion</high> transformieren:</li>
</ul>

`$$\large \hat{y} =  logistisch(b_{0} + b_{1} \times x_1 + ...)$$`

<ul style="margin-bottom:-20px">
 <li class="m2">Die logistische Funktion <high>bildet Vorhersagen auf den Bereich von 0 und 1</high> – die beiden Kategorien – ab.</li>
</ul>

$$ logistisch(x) = \frac{1}{1+exp(-x)}$$

]

]

---

# Logistische Regression

`$$\large \hat{y} =  logistisch(b_{0} + b_{1} \times x_1 + ...)$$`

<ul style="margin-bottom:-20px">
 <li class="m2">Die logistische Funktion <high>bildet Vorhersagen auf den Bereich von 0 und 1</high> – die beiden Kategorien – ab.</li>
</ul>

$$ logistisch(x) = \frac{1}{1+exp(-x)}$$

]

]

---

# Loss in Klassifikation

<ul style="margin-bottom:-20px">
 <li class="m1">Distanz</li>
 <ul class="level">
 <li>LogLoss wird i.A.R. <high>zum Fitten von Parametern</high>, verwendet, und wie MSE und MAE auch zur Evaluation.</li>
 </ul>
</ul>

`$$\small LogLoss = -\frac{1}{n}\sum_{i}^{n}(log(\hat{y})y+log(1-\hat{y})(1-y))$$`
`$$\small MSE = \frac{1}{n}\sum_{i}^{n}(y-\hat{y})^2; \: MAE = \frac{1}{n}\sum_{i}^{n} \lvert y-\hat{y} \rvert$$`

<ul>
 <li class="m2">Übereinstimmung</li>
 <ul class="level">
 <li>0-1 loss evaluiert die Übereinstimmung zwischen <high>vorhergesagter Klasse und tatsächlicher Klasse </high>, was im Vergleich <high>leicht zu interpretieren</high> ist.</li>
 </ul>
</ul>

`$$\small Loss_{01}=\frac{1}{n}\sum_i^n I(y \neq \lfloor \hat{y} \rceil)$$`

]

]

---

# Wahrheitsmatrix

<ul style="margin-bottom:-20px">
 <li class="m1">Die Wahrheitsmatrix (confusion matrix) <high>enthält die Anzahl einer bestimmten Klasse zugeordneter Werte</high> als Funktion der tatsächlichen Klasse.</li>
 <li class="m2">Anhand der Wahrheitsmatrix können unterschiedliche <high>statistische Gütekriterien</high> berechnet werden.</li>
</ul>

Wahrheitsmatrix

<table style="cellspacing:0; cellpadding:0; border:none;">
<col width=20%>
<col width=40%>
<col width=40%>
<tr>
 <td>
 </td>
 <td>
 <eq>y = 1</eq>
 </td>
 <td>
 <eq>y = 0</eq>
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <eq>y&#770; = 1</eq>
 </td>
 <td bgcolor="white">
 Richtig positiv (RP)
 </td>
 <td bgcolor="white">
 Falsch positiv (FP)
 </td>
</tr>
<tr>
 <td>
 <eq>y&#770; = 0</eq>
 </td>
 <td>
 Falsch negativ (FN)
 </td>
 <td>
 Richtig negativ (RN)
 </td>
</tr>
</table>

]

Genauigkeit: Prozentsatz richtiger Vorhersagen über alle Fälle hinweg.

`$$\small Richt. = \frac{RP + RN}{ RP + RN + FN + FP} = 1-Loss_{01}$$`

Sensitivität: Prozentsatz richtiger Vorhersagen über tatsächlich positive Fälle hinweg.

`$$\small Sensitivität = \frac{RP}{ RP +FN }$$`

Spezifität: Prozentsatz richtiger Vorhersagen über tatsächlich negative Fälle hinweg.

`$$\small Spezifität = \frac{RN}{ RN + FP }$$`

]

---

# Wahrheitsmatrix

Wahrheitsmatrix

<table style="cellspacing:0; cellpadding:0; border:none;">
<col width=20%>
<col width=40%>
<col width=40%>
<tr>
 <td>
 </td>
 <td>
 <eq>Krank</eq>
 </td>
 <td>
 <eq>Gesund</eq>
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <eq>"Krank"</eq>
 </td>
 <td bgcolor="white">
 RP = 3
 </td>
 <td bgcolor="white">
 FP = 1
 </td>
</tr>
<tr>
 <td>
 <eq>"Gesund"</eq>
 </td>
 <td>
 FN = 1
 </td>
 <td>
 RN = 2
 </td>
</tr>
</table>

]

Genauigkeit: Prozentsatz richtiger Vorhersagen über alle Fälle hinweg.

`$$\small Richt. = \frac{RP + RN}{ RP + RN + FN + FP} = 1-Loss_{01}$$`

Sensitivität: Prozentsatz richtiger Vorhersagen über tatsächlich positive Fälle hinweg.

`$$\small Sensitivität = \frac{RP}{ RP +FN }$$`

Spezifität: Prozentsatz richtiger Vorhersagen über tatsächlich negative Fälle hinweg.

`$$\small Spezifität = \frac{RN}{ RN + FP }$$`

]

---

<h1><a>Regressionsmodelle fitten in <mono>caret</mono></h1>

---

# `caret`s Haupfunktionen

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td>
 Funktion
 </td>
 <td>
 Beschreibung
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <mono>trainControl()</mono>
 </td>
 <td bgcolor="white">
 Wähle Spezifikationen dafür, wie das Modell gefittet werden soll.
 </td>
</tr>
<tr>
 <td>
 <mono>train()</mono>
 </td>
 <td>
 Spezifiziere das Modell und finde die besten Parameterschätzwerte.
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <mono>postResample()</mono>
 </td>
 <td bgcolor="white">
 Evaluiere die Modellperformanz (Fitting oder Vorhersage) für Regressionsprobleme.
 </td>
</tr>
<tr>
 <td>
 <mono>confusionMatrix()</mono>
 </td>
 <td bgcolor="white">
 Evaluiere die Modellperformanz (Fitting oder Vorhersage) für Klassifikationsprobleme.
 </td>
</tr>
</table>

]

```r
# Schritt 1: Definiere Kontrollparameter
#   trainControl()

ctrl <- trainControl(...)

# Schritt 2: Fitte und exploriere Modell
#   train()

mod <- train(...)
summary(mod)
mod$finalModel # bestes Modell

# Schritt 3: Beurteile Fit
#   predict(), postResample(),
#   confusionMatrix()

fit <- predict(mod)
postResample(fit, truth)
confusionMatrix(fit, truth)
```

]

---

# `trainControl()`

<ul>
 <li class="m1"><mono>trainControl()</mono> steuert, <high>wie <mono>caret</mono> ein Modell fittet</high>.</li>
 <li class="m2">Bis zur Session Optimisierung verwenden wir <highm>method = "none"</highm>.</li>
</ul>

```r
# Fitte das Modell ohne fortgeschrittene
#  Tuningmethoden der Parameter

ctrl <- trainControl(method = "none")

# zeige Dokumentation
?trainControl
```

]

]

---

# `train()`

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td>
 Argument
 </td>
 <td>
 Beschreibung
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <mono>form</mono>
 </td>
 <td bgcolor="white">
 Modellformel, zur Spezifikation von Kriterium und Features.
 </td>
</tr>
<tr>
 <td>
 <mono>data</mono>
 </td>
 <td>
 Datensatz für die Parameterschätzung.
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <mono>method</mono>
 </td>
 <td bgcolor="white">
 Der Modellalgorithmus.
 </td>
</tr>
<tr>
 <td>
 <mono>trControl</mono>
 </td>
 <td bgcolor="white">
 Kontrollparameter für den Fittingprozess.
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <mono>tuneGrid</mono>, <mono>preProcess</mono>
 </td>
 <td bgcolor="white">
 Coole Dinge für später.
 </td>
</tr>
</table>

]

```r
# Fitte eine Regression zur Vorhersage des 
# Einkommens

eink_mod <-
 train(form = einkommen ~ ., # Formel
 data = basel, # Daten
 method = "glm", # Regression
 trControl = ctrl) # Kontroll-
 # parameter
eink_mod
```

```
Generalized Linear Model

6120 samples
  19 predictor

No pre-processing
Resampling: None 
```

]

---

# `train()`

.pull-left4[
<ul>
 <li class="m1"><mono>train()</mono> ist <mono>caret</mono>'s <high>Zugpferd</high>, wenn es um Fitting geht. Es können <high>über 200 Modelle</high> mit nur leichten Änderungen im <high>method</high> Argument gefittet werden.</li>
</ul>

]

```r
# Fitte random forest zur Vorhersage von 
# Einkommen

eink_mod <-
 train(form = einkommen ~ ., # Formel
 data = basel, # Daten
 method = "rf", # Random Forest
 trControl = ctrl) # Kontroll-
 # parameter
eink_mod
```

```
Random Forest

6120 samples
  19 predictor

No pre-processing
Resampling: None 
```

]

---

# `train()`

<ul>
 <li class="m1"><mono>train()</mono> ist <mono>caret</mono>'s <high>Zugpferd</high>, wenn es um Fitting geht. Es können <high>über 200 Modelle</high> mit nur leichten Änderungen im <high>method</high> Argument gefittet werden.</li>
 <li class="m2">Alle 200+ Modelle findest du <a href="http://topepo.github.io/caret/available-models.html">hier</a>.</li>
</ul>

]

]

---

# `train()`

<ul style="margin-bottom:-20px">
 <li class="m1">Das Kriterium muss die richtige Klasse haben:
 
 <ul class="level">
 <li><high><mono>numeric</mono></high> Kriterium &rarr; <high>Regression</high> </li>
 <li><high><mono>factor</mono></high> Kriterium &rarr; <high>Klassifkation</high> </li>
 </ul>
 </li>
</ul>

```
# A tibble: 5 x 5
 Ausfall Alter Geschlecht Karten Bildung
 <dbl> <dbl> <chr> <dbl> <dbl>
1 0 45 M 3 11
2 1 36 F 2 14
3 0 76 F 5 12
4 1 25 M 2 17
5 1 36 F 3 12
```

]

```r
# Regressionsproblem

loan_mod <- train(form = Ausfall ~ .,
 data = Loans,
 method = "glm",
 trControl = ctrl)

# Klassifikationsproblem

load_mod <- train(form = factor(Ausfall) ~ .,
 data = Loans,
 method = "glm",
 trControl = ctrl)
```

]

---

# <mono>.$finalModel</mono>

<ul>
 <li class="m1">Die <mono>train()</mono> Funktion gibt eine <mono>list</mono>e zurück. Diese enthält ein <mono>finalModel</mono> Element - das ist unser <high>bestes gefittetes Model</high>.</li>
 <li class="m2">Greife auf das Modell mit <mono>.$finalModel</mono> zu und <high>exploriere</high> das Objekt mit generischen Funktionen:</li>
</ul>

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td>
 Funktion
 </td>
 <td>
 Beschreibung
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <mono>summary()</mono>
 </td>
 <td bgcolor="white">
 Überblick über die wichtigsten Resultate.
 </td>
</tr>
<tr>
 <td bgcolor="white">
 <mono>names()</mono>
 </td>
 <td bgcolor="white">
 Zeige Namen aller benannten Elemente (häufig mit `$` ansteuerbar).
 </td>
</tr>
</table>

]

```r
# Fitte Regressionsmodell
eink_mod <-
 train(form = einkommen ~ alter + groesse,
 data = basel, # Daten
 method = "glm", # Regression
 trControl = ctrl) # Kontrollparameter

# Zeige benannte Elemente
names(eink_mod$finalModel)
```

```
[1] "coefficients"  "residuals"     "fitted.values"
[4] "effects"       "R"             "rank"         
 [ reached getOption("max.print") -- omitted 28 entries ]
```

]

---

# <mono>.$finalModel</mono>

]

# Zugriff auf spezifische Elemente
eink_mod$finalModel$coefficients
```

```
(Intercept)       alter     groesse 
   877.2395    149.0087      0.6194 
```

]

---

# <mono>.$finalModel</mono>

]

```r
# Zeige Modelloutput
summary(eink_mod)
```

```

Call:
NULL

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
 -4042    -849      11     842    4748

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  877.239    220.726    3.97  7.1e-05 ***
 [ erreichte getOption("max.print") --  2 Zeilen ausgelassen ]
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1543046)

Null deviance: 5.1695e+10  on 6119  degrees of freedom
Residual deviance: 9.4388e+09  on 6117  degrees of freedom
AIC: 104578

Number of Fisher Scoring iterations: 2
```

]

---

# `predict()`

<ul>
 <li class="m1">Die <mono>predict()</mono> Funktion <high>macht Modellvorhersagen</high>. Dazu muss man lediglich den Modelloutput als erstes Argument spezifizieren.</li>
</ul>

```r
# Extrahiere gefittete Werte
glm_fits <- predict(object = eink_mod)
glm_fits[1:8]
```

```
    1     2     3     4     5     6     7     8 
 9032  6035  4565  8119  8884  8432 10221 11858 
```

]

]

---

# `postResample()`

<ul>
 <li class="m1">Die <mono>postResample()</mono> Funktion <high>erstellt eine einfache Zusammenfassung</high> der Modellperformanz bei <high>Regressionsproblemen</high>. Zu spezifizierende Argumente sind die vorhergesagten Werte und die tatsächlichen Werte.</li>
</ul>

```r
# Evaluiere Modellperformanz
postResample(glm_fits,
             basel$einkommen)
```

```
     RMSE  Rsquared       MAE 
1241.8895    0.8174  993.5378 
```

]

]

---

## `confusionMatrix()`

<ul>
 <li class="m1"><mono>confusionMatrix()</mono> erstellt eine Zusammenfassung der <high>Modellperformanz bei Klassifikationsproblemen</high>. Inputs sind wiederum die vorhergesagten und tatsächlichen Werte.</li>
</ul>

```r
# Regressionsmodell zur Klassifikation
sehhilfe_mod <-
 train(form = factor(sehhilfe) ~ alter + geschlecht,
 data = basel, 
 method = "glm", 
 trControl = ctrl)

# Evaluiere Modellperformanz
confusionMatrix(predict(sehhilfe_mod),
                basel$sehhilfe)
```

]

```
Confusion Matrix and Statistics

Reference
Prediction ja nein
 ja 3984 2136
 nein 0 0
 
 Accuracy : 0.651 
 95% CI : (0.639, 0.663)
 No Information Rate : 0.651 
 P-Value [Acc > NIR] : 0.506 
 
 Kappa : 0 
 
 Mcnemar's Test P-Value : <2e-16 
 
 Sensitivity : 1.000 
 Specificity : 0.000 
 Pos Pred Value : 0.651 
 Neg Pred Value : NaN 
 Prevalence : 0.651 
 Detection Rate : 0.651 
 [ erreichte getOption("max.print") -- 5 Zeilen ausgelassen ]
```

]

---

<h1><a href=https://therbootcamp.github.io/ML_2020Apr/_sessions/Fitting/Fitting_practical.html>Practical</a></h1>