Basel R Bootcamp
### October 2019 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="" height=14 style="vertical-align: middle"/> </span> <a href=""> <span style="padding-left:82px"> <font color="#7E7E7E"> </font> </span> </a> <a href=""> <font color="#7E7E7E"> Machine Learning with R | October 2019 </font> </a> </span> </div> --- # There is no free lunch .pull-left35[ <u>Theorem</u> Given a finite set `\(V\)` and a finite set `\(S\)` of real numbers, <high>assume that `\(f:V\to S\)` is chosen at random</high> according to uniform distribution on the set `\(S^{V}\)` of all possible functions from `\(V\)` to `\(S\)`. For the problem of optimizing `\(f\)` over the set `\(V\)`, <high>then no algorithm performs better than blind search.</high> <br><br><br><br> <a href="">Wolpert & Macready, 1997, No Free Lunch Theorems for Optimization</a> ] .pull-right55[ <p align = "center"> <img src="image/free_lunch.jpg" height=400px width=650px><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] --- .pull-left4[ # Know your problem <u>Bias-variance dilemma</u> <br> `$$\Large Error = Bias + Variance$$` <br> Simply put... <b>Bias</b> arises from strong <high>model assumptions</high> not being met by the environment. <b>Variance</b> arises from high <high>model flexibility</high> fitting the noise in the data (i.e., overfitting). <br> → <high>Make strong assumptions</high> (use simple models), if possible. ] .pull-right45[ <p align="left"> <br> <img src="image/bias_variance.png" height=580px> </p> ] --- .pull-left4[ # Linear or non-linear <br> One important model assumptions concerns linearity. <br> <b>Linear models</b> (`lm`, `glm`) make strong model assumptions. They are more often wrong, but also ceteris paribus <high>less prone to overfitting</high>. <br> <b>Non-linear Models</b> (everything else) make weaker model assumptions, leaving the exact relationship (more) open. They are are closer to the truth, but also ceteris paribus <high>more prone to overfitting</high>. ] .pull-right5[ <br><br><br> <p align = "center"> <img src="image/linearity.png" height=480px><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] --- .pull-left45[ # Kernel trick <high>Transforms "input space" into new "feature space"</high> to allows for object separation. <p align="center"> <img src="image/kernel_bw.png" height=160px> </p> Used in <high>Support Vector Machines</high> (e.g., `method = "svmRadial"`) often using a <high>radial basis function</high> (rdf). <p align="center"> <img src="image/rdf_kernel.png" width=300px> </p> Kernels <high>re-represent objects</high> in terms of other objects! ] .pull-right5[ <br><br><br> <p align = "center"> <img src="image/linearity.png" height=480px><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] --- # Automatic feature engineering <high>Deep learning</high> aka neural networks and, especially, <high>convolutional neural networks</high>, excel because they generate their features. Neural networks are not the focus of `caret` and this course. Powerful implementations based on <high>Google's Tensorflow</high> library are provided by `tensorflow`. .pull-left3[ <p align = "center"> <img src="image/tf.png"><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] .pull-right65[ <p align = "center"> <img src="image/power_of_deeplearning.png" height=265px><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] --- # Robustness .pull-left4[ To produce <high>robust predictions</high> that <high> suffer less from variance</high> ML models use a variety of <high>tricks</high>. <p align = "center"> <img src="image/robustness_sel.png" width=350px><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] .pull-right55[ <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="210"> <col width="210"> <col width="210"> <tr> <th>Approach</th> <th>Implementation</th> <th>Examples</th> </tr> <tr style="background-color:#ffffff"> <td align="center"><i>Tolerance</i></td> <td align="center">Decrease error tolerance</td> <td align="center"><mono>svmRadial</mono></td> </tr> <tr style="background-color:#ffffff"> <td align="center"><i>Regularization</i></td> <td align="center">Penalize for complexity</td> <td align="center"><mono>lasso</mono>, <mono>ridge</mono>, <mono>elasticnet</mono></td> </tr> <tr style="background-color:#ffffff"> <td align="center"><i>Ensemble</i></td> <td align="center">Bagging</td> <td align="center"><mono>treebag</mono>, <mono>randomGLM</mono>, <mono>randomForest</mono></td> </tr> <tr style="background-color:#ffffff"> <td align="center"><i>Ensemble</i></td> <td align="center">Boosting</td> <td align="center"><mono>adaboost</mono>, <mono>xgbTree</mono></td> </tr> <tr style="background-color:#ffffff"> <td align="center"><i>Feature selection</i></td> <td align="center">Regularization</td> <td align="center"><mono>lasso</mono></td> </tr> <tr style="background-color:#ffffff"> <td align="center"><i>Feature selection</i></td> <td align="center">Importance</td> <td align="center"><mono>random forest</mono></td> </tr> </table> ] --- # Regularization .pull-left45[ Regularization is the process of adding model terms, usually <high>penalties for complexity</high>, in order to prevent overfitting (or solve a problem in the first place). <br2> <p align = 'center'><font size=5><high>Loss</high> = <high>Misfit</high> + <high>Penalty</high></font></p> <br> <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="160"> <col width="160"> <col width="160"> <tr> <th>Name</th> <th>Penalty</th> <th>`caret`</th> </tr> <tr style="background-color:#ffffff"> <td align="center"><high>AIC/BIC</high></td> <td align="center"><img src="image/regularization/aicbic.png" height=24px></td> <td align="center">-</td> </tr> <tr style="background-color:#ffffff"> <td align="center"><high>Lasso</high></td> <td align="center"><img src="image/regularization/lasso.png" height=24px></td> <td align="center">`method = "glmnet"`</td> </tr> <tr style="background-color:#ffffff"> <td align="center"><high>Ridge</high></td> <td align="center"><img src="image/regularization/ridge.png" height=24px></td> <td align="center">`method = "glmnet"`</td> </tr> <tr style="background-color:#ffffff"> <td align="center"><high>Elastic Net</high></td> <td align="center"><img src="image/regularization/ridge.png" height=24px></td> <td align="center">`method = "glmnet"`</td> </tr> </table> ] .pull-right5[ <img src="Models_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] --- .pull-left45[ # Bagging <high>Aggregate</high> predictions from multiple fits to <high>resampled</high> data. Especially beneficial for models that produce relatively unstable solutions, e.g., regression trees. `rpart` → `treebag`. <br> <u>Algorithm</u> 1 - <high>Resample</high> data (with replacement). 2 - <high>Fit</high> model to resampled data. 3 - <high>Average</high> predictions. ] .pull-right45[ <br><br><br> <p align = "center"> <img src="image/münchhausen.jpg" height=450px><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] --- # Boosting .pull-left4[ Bootsing <high>adaptively re-weights</high> samples based on performance. `adaboost` and, newer, `xgbTree`, are some of the <high>best ML models out there</high>. <u>Algorithm</u> 1 - Assign <high>equal weight</high> to all cases. 2 - <high>Fit</high> simple model. 3 - <high>Increase weight of misfit cases</high> by model misfit for next iteration. 4 - <high>Repeat</high>. 5 - <high>Average</high> predictions weighted by model misfit. ] .pull-right5[ <p align = "center"> <img src="image/bagg_boost.png" height=410px><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] --- # Automatic feature selection .pull-left45[ Many models reduce complexity by automatically relying on a subset of good features. <u>Two examples</u> <b>LASSO</b> Regularization, in particular via `lasso`, frequently <high>estimates <mono>beta = 0</mono></high> and, thus, essentially deselects that feature. <b>Random forests</b> As random forests select at any node the best of `mtry`-many randomly selected features, <high>unpredictive features may never come to action</high>. This is especially true for large `mtry`. ] .pull-right45[ <p align="center"> <img src="image/self_tuning.png" height=420px><br> <font style="font-size:10px">from <a href=""></a></font> </p> ] --- # Some help in choosing models <p align = "center"> <img src="image/mlmap.png" height = 450px><br> <font style="font-size:10px">from <a href=""></font> </p> --- # Remember .pull-left45[ <i>"…some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used."</i> Pedro Domingos <br><br> <i>"The algorithms we used are very standard for Kagglers. […] We spent most of our efforts in feature engineering. [...] We were also very careful to discard features likely to expose us to the risk of over-fitting our model."</i> Xavier Conort ] .pull-right45[ <p align="center"> <img src="image/albert.jpeg" ><br> <font style="font-size:10px">from <a href="§ion=1"></a></font> </p> ] --- class: middle, center <h1><a href="">Practical</a></h1>