Modelle

# Modelle
### Maschinelles Lernen mit R <a href='https://therbootcamp.github.io'> The R Bootcamp </a> <a href='https://therbootcamp.github.io/ML_2020Oct/'> </a>  <a href='https://therbootcamp.github.io'> </a>  <a href='mailto:therbootcamp@gmail.com'> </a>  <a href='https://www.linkedin.com/company/basel-r-bootcamp/'> </a>
### Oktober 2020

---

<div class="my-footer">
 
 
 <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/>
 
 <a href="https://therbootcamp.github.io/">
 
 
 www.therbootcamp.com
 
 
 </a>
 <a href="https://therbootcamp.github.io/">
 
 Maschinelles Lernen mit R | Oktober 2020
 
 </a>
 
 </div>

---

# There is no free lunch

"The no-free-lunch theorem of optimization is an impossibility theorem telling us that a general-purpose, universal optimization strategy is impossible. The only way one strategy can outperform another is if it is specialized to the structure of the specific problem
under consideration."

<a href="https://link.springer.com/article/10.1023/A:1021251113462">Ho & Pepyne, 1997</a>
<a href="https://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf">Wolpert & Macready, 1997</a>

]

<img src="image/free_lunch.jpg" height=400px width=650px> 
 from <a href="http://christianfunnypictures.com/2016/02/theres-no-such-thing-as-a-free-lunch-or-is-there.html">christianfunnypictures.com</a>

]

---

# Bias-Variance Dilemma

`$$\large Error = Bias + Variance\;(+ Noise)$$`

<ul style="margin-top:40px">
 <li class="m1">Bias</li>
 <ul class="level">
 <li>Ensteht, wenn <high>Modellannahmen</high> nicht mit dem Problem übereinstimmen.</li>
 </ul> 
 <li class="m2">Variance</li>
 <ul class="level">
 <li>Ensteht, wenn <high>Modellflexibilität</high> zu hoch ist.</li>
 </ul> 
 <li class="m3">Noise</li>
 <ul class="level">
 <li>(Absolut) zufälliges Rauschen.</li>
 </ul>
</ul>

]

]

---

<h1><a>Kenne dein Problem</a></h1>

---

<ul>
 <li class="m1">Lineare Modelle</li>
 <ul class="level">
 <li>Machen die Annahme, dass die Welt linear ist. Stimmt selten, führt jedoch zu <high>weniger overfitting</high>.</li>
 </ul> 
 <li class="m2">Nicht-lineare Modelle</li>
 <ul class="level">
 <li>Beinhaltet (meist) lineare Modelle als Spezialfall und neigen damit zu <high>mehr overfitting</high>. Lohnt sich i.A. nur, wenn lineare Modelle das Problem schlecht approximieren. </li>
 </ul>
</ul>

]

.pull-right5[
 
 

 <img src="image/linearity.png" height=480px> 
 from <a href="https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html">scikit-learn.org</a>

]

---

# Kernel Trick

<ul>
 <li class="m1">Transformiert ursprünglichen Featurespace in einen neuen, der damit z.B. das Trennen von Klassen ermöglicht.</li> 
 <li class="m2">Verwendet z.B. in <high>Support Vector Machines</high> (z.B. <mono>method = "svmRadial"</mono>) meist unter Verwendung einer <high>radial basis function</high> (rdf).</li>
</ul>

]

<img src="image/linearity.png" height=480px> 
 from <a href="https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html">scikit-learn.org</a>

]

---

# Automatische Featureentwicklung

<ul>
 <li class="m1"><high>Deep learning</high> (insbesondere Convolutional Neural Networks) sind besonders gut darin, <high>Features selber zu entwickeln</high>.</li> 
 <li class="m2">Das <mono>tensorflow</mono> Paket bietet Zugang zu Google's mächtiger <high>Tensorflow</high> Library für Deep Learning.</li> 
</ul>

]

<img src="image/power_of_deeplearning.png"> 
 from <a href="https://towardsdatascience.com/cnn-application-on-structured-data-automated-feature-extraction-8f2cd28d9a7e">towardsdatascience.com</a>

]

---

<h1><a>Bleibe robust</a></h1>

---

# Robustheit

<ul>
 <li class="m1">Es gibt eine Reihe von <high>Tricks</high> um Modelle robuster zu machen, d.h. um den Variance Fehler zu minimieren.</li> 
</ul>

<img src="image/robustness_sel.png" width=350px> 
 from <a href="https://www.istockphoto.com/ch/grafiken/kraftathlet?sort=mostpopular&mediatype=illustration&assetfiletype=eps&phrase=kraftathlet">istockphoto.com</a>

]

.pull-right55[
<table style="cellspacing:0; cellpadding:0; border:none;">
 <col width="210">
 <col width="210">
 <col width="210">
<tr>
 <th>Approach</th>
 <th>Implementation</th>
 <th>Examples</th>
</tr>
<tr style="background-color:#ffffff">
 <td align="center">Tolerance</td>
 <td align="center">Vergrössert Fehlertoleranz</td>
 <td align="center"><mono>svmRadial</mono></td>
</tr>
<tr style="background-color:#ffffff">
 <td align="center">Regularization</td>
 <td align="center">Strafe für Komplexität</td>
 <td align="center"><mono>lasso</mono>, <mono>ridge</mono>, <mono>elasticnet</mono></td>
</tr>
<tr style="background-color:#ffffff">
 <td align="center">Ensemble</td>
 <td align="center">Bagging</td>
 <td align="center"><mono>treebag</mono>, <mono>randomGLM</mono>, <mono>randomForest</mono></td>
</tr>
<tr style="background-color:#ffffff">
 <td align="center">Ensemble</td>
 <td align="center">Boosting</td>
 <td align="center"><mono>adaboost</mono>, <mono>xgbTree</mono></td>
</tr>
<tr style="background-color:#ffffff">
 <td align="center">Feature selection</td>
 <td align="center">Regularization</td>
 <td align="center"><mono>lasso</mono></td>
</tr>
<tr style="background-color:#ffffff">
 <td align="center">Feature selection</td>
 <td align="center">Importance</td>
 <td align="center"><mono>random forest</mono></td>
</tr>
</table>

]

---

# Regularization

<ul>
 <li class="m1">Regularisierung bestraft den Fit für die <high> modell-spezifische Komplexität</>.</li> 
</ul>

<br2>
<high>Loss</high> = <high>Misfit</high> + <high>Penalty</high>

<table style="cellspacing:0; cellpadding:0; border:none;">
 <col width="160">
 <col width="160">
 <col width="160">
<tr>
 <th>Name</th>
 <th>Penalty</th>
 <th>`caret`</th>
</tr>
<tr style="background-color:#ffffff">
 <td align="center"><high>AIC/BIC</high></td>
 <td align="center"><img src="image/regularization/aicbic.png" height=24px></td>
 <td align="center">-</td>
</tr>
<tr style="background-color:#ffffff">
 <td align="center"><high>Lasso</high></td>
 <td align="center"><img src="image/regularization/lasso.png" height=24px></td>
 <td align="center">`method = "glmnet"`</td>
</tr>
<tr style="background-color:#ffffff">
 <td align="center"><high>Ridge</high></td>
 <td align="center"><img src="image/regularization/ridge.png" height=24px></td>
 <td align="center">`method = "glmnet"`</td>
</tr>
<tr style="background-color:#ffffff">
 <td align="center"><high>Elastic Net</high></td>
 <td align="center"><img src="image/regularization/ridge.png" height=24px></td>
 <td align="center">`method = "glmnet"`</td>
</tr>
</table>

]

]

---

# Bagging

<ul>
 <li class="m1"><high>Aggregieren</high> der Vorhersagen multipler Modelle, die auf Basis von <high>Stichproben</high> aus den Daten gefitted werden.</li> 
 <li class="m2">Besonders nützlich bei Modellen mit moderat guter, varierender Performanz.</li> 
 <li class="m3">Algorithmus:</li>
 <ol>
 <li><high>Stichprobe</high> aus den Daten mit zurücklegen</li>
 <li><high>Fitte</high> Modelle zu Stichproben</li>
 <li><high>Mittele</high> die Vorhersage</li>
 </ol>
</ul>

]

.pull-right45[
 

 <img src="image/münchhausen.jpg" height=450px> 
 from <a href="https://en.wikipedia.org/wiki/M%C3%BCnchhausen_trilemma">wikipedia.org</a>

]

---

# Boosting

<ul>
 <li class="m1">Eine Art iteratives Bagging auf Basis einer <high>adaptiven Gewichtung</high> der Daten.</li> 
 <li class="m2">Variante Extreme Gradient Boosting <mono>xgbTree</mono> zählt zu den <high>besten verfügbaren Modellen</high>.</li> 
 <li class="m3">Algorithmus:</li>
 <ol>
 <li><high>Gewichte</high> alle Datenpunkte gleich.</li>
 <li><high>Fitte</high> ein moderat flexibles Modell.</li>
 <li><high>Erhöhe</high> Gewicht schlecht vorhergesagter Punkte.</li>
 <li><high>Wiederhole</high> iterativ.</li>
 <li><high>Mittele</high> Vorhersage proportional zum Modellfit.</li>
 </ol>
</ul>

]

<img src="image/bagg_boost.png" height=410px> 
 from <a href="https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html">scikit-learn.org</a>

]

---

# Automatische Featureselektion

<ul>
 <li class="m1">LASSO</li> 
 <ul>
 <li>Schätzt oft <high><mono>beta = 0</mono></high>, was effektiv das Feature aus dem Modell eliminiert.</li>
 </ul> 
 <li class="m2">Decision Tree / Random forests</li> 
 <ul>
 <li>Features müssen nicht für die Vorhersage verwendet werden (bei hohem <mono>mtry</mono>).</li>
 <ul>
</ul>

]

<img src="image/self_tuning.png" height=420px> 
from <a href="https://medium.com/@dkwok94/machine-learning-for-my-grandma-ca242e97ef62">medium.com</a>

]

---

---

# Erinnere

"…some machine learning projects succeed and some fail. What makes the difference? <high>Easily the most important factor is the features used</high>."

[Pedro Domingos](https://en.wikipedia.org/wiki/Pedro_Domingos)

"The algorithms we used are very standard for Kagglers. […] <high>We spent most of our efforts in feature engineering.</high> [...] We were also very careful to discard features likely to expose us to the risk of over-fitting our model."

[Xavier Conort]()

]

]

---

<h1><a href="https://therbootcamp.github.io/ML_2020Oct/_sessions/Models/Models_practical.html">Practical</a></h1>