class: center, middle, inverse, title-slide # Fitting ### Applied Machine Learning with R
The R Bootcamp @ AMLD
### November 2021 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> www.therbootcamp.com </font> </span> </a> <a href="https://therbootcamp.github.io/"> <font color="#7E7E7E"> Applied Machine Learning with R @ AMLD | November 2021 </font> </a> </span> </div> --- .pull-left45[ # Fitting <p style="padding-top:1px"></p> <ul> <li class="m1"><span>Models are actually <high>families of models</high>, with every parameter combination specifying a different model.</span></li> <li class="m2"><span>To fit a model means to <high>identify</high> from the family of models <high>the specific model that fits the data best</high>.</span></li> </ul> ] .pull-right45[ <br><br> <p align = "center"> <img src="image/curvefits.png" height=480px><br> <font style="font-size:10px">adapted from <a href="https://www.explainxkcd.com/wiki/index.php/2048:_Curve-Fitting">explainxkcd.com</a></font> </p> ] --- # Loss function .pull-left45[ <ul> <li class="m1"><span>Possible <high>the most important concept</high> in statistics and machine learning.</span></li> <li class="m2"><span>The loss function defines some <high>summary of the errors committed by the model</high>.</span></li> </ul> <p style="padding-top:7px"> `$$\Large Loss = f(Error)$$` <p style="padding-top:7px"> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td> <b>Purpose</b> </td> <td> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> Fitting </td> <td bgcolor="white"> Find parameters that minimize loss function. </td> </tr> <tr> <td> Evaluation </td> <td> Calculate loss function for fitted model. </td> </tr> </table> ] .pull-right45[ <img src="Fitting_files/figure-html/unnamed-chunk-2-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Loss function .pull-left45[ <ul> <li class="m1"><span>Possible <high>the most important concept</high> in statistics and machine learning.</span></li> <li class="m2"><span>The loss function defines some <high>summary of the errors committed by the model</high>.</span></li> </ul> <p style="padding-top:7px"> `$$\Large Loss = f(Error)$$` <p style="padding-top:7px"> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td> <b>Purpose</b> </td> <td> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> Fitting </td> <td bgcolor="white"> Find parameters that minimize loss function. </td> </tr> <tr> <td> Evaluation </td> <td> Calculate loss function for fitted model. </td> </tr> </table> ] .pull-right45[ <img src="Fitting_files/figure-html/unnamed-chunk-3-1.png" width="90%" style="display: block; margin: auto;" /> ] --- class: center, middle <high><h1>Regression</h1></high> <font color = "gray"><h1>Decision Trees</h1></font> <font color = "gray"><h1>Random Forests</h1></font> --- # Regression .pull-left45[ In [regression](https://en.wikipedia.org/wiki/Regression_analysis), the criterion `\(Y\)` is modeled as the <high>sum</high> of <high>features</high> `\(X_1, X_2, ...\)` <high>times weights</high> `\(\beta_1, \beta_2, ...\)` plus `\(\beta_0\)` the so-called the intercept. <p style="padding-top:10px"></p> `$$\large \hat{Y} = \beta_{0} + \beta_{1} \times X_1 + \beta_{2} \times X2 + ...$$` <p style="padding-top:10px"></p> The weight `\(\beta_{i}\)` indiciates the <high>amount of change</high> in `\(\hat{Y}\)` for a change of 1 in `\(X_{i}\)`. Ceteris paribus, the <high>more extreme</high> `\(\beta_{i}\)`, the <high>more important</high> `\(X_{i}\)` for the prediction of `\(Y\)` <font style="font-size:12px">(Note: the scale of `\(X_{i}\)` matters too!).</font> If `\(\beta_{i} = 0\)`, then `\(X_{i}\)` <high>does not help</high> predicting `\(Y\)` ] .pull-right45[ <img src="Fitting_files/figure-html/unnamed-chunk-4-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Regression loss .pull-left45[ <p> <ul style="margin-bottom:-20px"> <li class="m1"><span><b>Mean Squared Error (MSE)</b> <br><br> <ul class="level"> <li><span>Average <high>squared distance</high> between predictions and true values.</span></li> </ul> </span></li> </ul> `$$MSE = \frac{1}{n}\sum_{i \in 1,...,n}(Y_{i} - \hat{Y}_{i})^{2}$$` <ul> <li class="m2"><span><b>Mean Absolute Error (MAE)</b> <br><br> <ul class="level"> <li><span>Average <high>absolute distance</high> between predictions and true values.</span></li> </ul> </span></li> </ul> $$ MAE = \frac{1}{n}\sum_{i \in 1,...,n} \lvert Y_{i} - \hat{Y}_{i} \rvert$$ </p> ] .pull-right45[ <img src="Fitting_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # 2 types of supervised problems .pull-left45[ <ul style="margin-bottom:-20px"> <li class="m1"><span><b>Regression</b> <br><br> <ul class="level"> <li><span>Regression problems involve the <high>prediction of a quantitative feature</high>.</span></li> <li><span>E.g., predicting the cholesterol level as a function of age</high>.</span></li> </ul> </span></li><br> <li class="m2"><span><b>Classification</b> <br><br> <ul class="level"> <li><span>Classification problems involve the <high>prediction of a categorical feature</high>.</span></li> <li><span>E.g., predicting the type of chest pain as a function of age</high>.</span></li> </ul> </span></li> </ul> ] .pull-right4[ <p align = "center"> <img src="image/twotypes.png" height=440px><br> </p> ] --- # Logistic regression .pull-left45[ <ul style="margin-bottom:-20px"> <li class="m1"><span>In <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a>, the class criterion <font style="font-size:22px"><mono>Y ∈ (0,1)</mono></font> is modeled also as the <high>sum of feature times weights</high>, but with the prediction being transformed using a <high>logistic link function</high>.</span></li> </ul> <p style="padding-top:10px"></p> `$$\large \hat{Y} = Logistic(\beta_{0} + \beta_{1} \times X_1 + ...)$$` <p style="padding-top:10px"></p> <ul style="margin-bottom:-20px"> <li class="m2"><span>The logistic function <high>maps predictions to the range of 0 and 1</high>, the two class values..</span></li> </ul> <p style="padding-top:10px"></p> $$ Logistic(x) = \frac{1}{1+exp(-x)}$$ ] .pull-right45[ <img src="Fitting_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Logistic regression .pull-left45[ <ul style="margin-bottom:-20px"> <li class="m1"><span>In <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a>, the class criterion <font style="font-size:22px"><mono>Y ∈ (0,1)</mono></font> is modeled also as the <high>sum of feature times weights</high>, but with the prediction being transformed using a <high>logistic link function</high>.</span></li> </ul> <p style="padding-top:10px"></p> `$$\large \hat{Y} = Logistic(\beta_{0} + \beta_{1} \times X_1 + ...)$$` <p style="padding-top:10px"></p> <ul style="margin-bottom:-20px"> <li class="m2"><span>The logistic function <high>maps predictions to the range of 0 and 1</high>, the two class values..</span></li> </ul> <p style="padding-top:10px"></p> $$ Logistic(x) = \frac{1}{1+exp(-x)}$$ ] .pull-right45[ <img src="Fitting_files/figure-html/unnamed-chunk-7-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Classification loss .pull-left45[ <ul style="margin-bottom:-20px"> <li class="m1"><span><b>Distance</b> <br><br> <ul class="level"> <li><span>Logloss is <high>used to fit the parameters</high>, alternative distance measures are MSE and MAE.</span></li> </ul> </span></li> </ul> `$$\small LogLoss = -\frac{1}{n}\sum_{i}^{n}(log(\hat{y})y+log(1-\hat{y})(1-y))$$` <ul> <li class="m2"><span><b>Overlap</b> <br><br> <ul class="level"> <li><span>Does the <high>predicted class match the actual class</high>. Often preferred for <high>ease of interpretation</high>..</span></li> </ul> </span></li> </ul> `$$\small Loss_{01} = 1-Accuracy = \frac{1}{n}\sum_i^n I(y \neq \lfloor \hat{y} \rceil)$$` ] .pull-right45[ <img src="Fitting_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto;" /> ] --- class: center, middle <p align = "center"> <img src="https://www.tidymodels.org/images/tidymodels.png" width=240px><br> <font style="font-size:10px">from <a href="https://www.tidymodels.org/packages/">tidymodels.org</a></font> </p> --- .pull-left4[ # Fitting <mono>tidymodels</mono> <br> <ul> <li class="m1"><span>Define the <mono>recipe</mono>.</span></li><br> <li class="m2"><span>Define the model.</span></li><br> <li class="m3"><span>Define the <mono>workflow</mono>.</span></li><br> <li class="m4"><span>Fit the <mono>workflow</mono></span></li><br> <li class="m5"><span>Assess model performance.</span></li> </ul> ] .pull-right5[ <p align = "center"> <br> <img src="image/tidymodels_fit.png" height=560px><br> </p> ] --- # Define the <mono>recipe</mono> .pull-left45[ <ul> <li class="m1"><span>The <mono>recipe</mono> specifies two things:</span></li><br> <ul class="level"> <li><span>The criterion and the features, i.e., the <mono>formula</mono> to use.</span></li><br> <li><span>How the features should be pre-processed before the model fitting.</span></li><br> </ul> <li class="m2"><span>To set up a <mono>recipe</mono>:</span></li><br> <ul class="level"> <li><span>Initialize it with <mono>recipe()</mono>, wherein the formula and data are specified.</span></li><br> <li><span>Add pre-processing steps, using <mono>step_*()</mono> functions and <mono>dplyr</mono>-like selectors.</span></li><br> </ul> </ul> ] .pull-right45[ ```r # set up recipe for regression model lm_recipe <- recipe(income ~ ., data = baselers) %>% step_dummy(all_nominal_predictors()) lm_recipe ``` ``` Data Recipe Inputs: role #variables outcome 1 predictor 19 Operations: Dummy variables from all_nominal_predictors() ``` ] --- # Define the <mono>recipe</mono> .pull-left45[ <ul> <li class="m1"><span>The <mono>recipe</mono> specifies two things:</span></li><br> <ul class="level"> <li><span>The criterion and the features, i.e., the <mono>formula</mono> to use.</span></li><br> <li><span>How the features should be pre-processed before the model fitting.</span></li><br> </ul> <li class="m2"><span>To set up a <mono>recipe</mono>:</span></li><br> <ul class="level"> <li><span>Initialize it with <mono>recipe()</mono>, wherein the formula and data are specified.</span></li><br> <li><span>Add pre-processing steps, using <mono>step_*()</mono> functions and <mono>dplyr</mono>-like selectors.</span></li><br> </ul> </ul> ] .pull-right45[ ```r # set up recipe for logistic regression # model logistic_recipe <- recipe(eyecor ~., data = baselers) %>% step_dummy(all_nominal_predictors()) logistic_recipe ``` ``` Data Recipe Inputs: role #variables outcome 1 predictor 19 Operations: Dummy variables from all_nominal_predictors() ``` ] --- # Define the model .pull-left45[ <ul> <li class="m1"><span>The model specifies:</span></li><br> <ul class="level"> <li><span>Which model (e.g. linear regression) to use.</span></li><br> <li><span>Which engine (underlying model-fitting algorithm) to use.</span></li><br> <li><span>The problem mode, i.e., <high>regression vs. classification</high>.</span></li><br> </ul> <li class="m2"><span>To set up a model:</span></li><br> <ul class="level"> <li><span>Specify the model, e.g., using <mono>linear_reg()</mono> or <mono>logistic_reg()</mono>.</span></li><br> <li><span>Specify the engine using <mono>set_engine()</mono>.</span></li><br> <li><span>Specify the problem mode using <mono>set_mode()</mono>.</span></li><br> </ul> </ul> ] .pull-right45[ ```r # set up model for regression model lm_model <- linear_reg() %>% set_engine("lm") %>% set_mode("regression") lm_model ``` ``` Linear Regression Model Specification (regression) Computational engine: lm ``` ] --- # Define the model .pull-left45[ <ul> <li class="m1"><span>The model specifies:</span></li><br> <ul class="level"> <li><span>Which model (e.g. linear regression) to use.</span></li><br> <li><span>Which engine (underlying model-fitting algorithm) to use.</span></li><br> <li><span>The problem mode, i.e., <high>regression vs. classification</high>.</span></li><br> </ul> <li class="m2"><span>To set up a model:</span></li><br> <ul class="level"> <li><span>Specify the model, e.g., using <mono>linear_reg()</mono> or <mono>logistic_reg()</mono>.</span></li><br> <li><span>Specify the engine using <mono>set_engine()</mono>.</span></li><br> <li><span>Specify the problem mode using <mono>set_mode()</mono>.</span></li><br> </ul> </ul> ] .pull-right45[ ```r # set up model for logistic regression # model logistic_model <- logistic_reg() %>% set_engine("glm") %>% set_mode("classification") logistic_model ``` ``` Logistic Regression Model Specification (classification) Computational engine: glm ``` ] --- # Define the <mono>workflow</mono> .pull-left45[ <ul> <li class="m1"><span>A <mono>workflow</mono> combines the recipe and model and facilitates fitting the model. To set up a <mono>workflow</mono>:</span></li><br> <ul class="level"> <li><span>Initialize it using the <mono>workflow()</mono> function.</span></li><br> <li><span>Add a recipe using <mono>add_recipe()</mono>.</span></li><br> <li><span>Add a model using <mono>add_model()</mono>.</span></li><br> </ul> </ul> ] .pull-right45[ ```r # set up workflow for regression model lm_workflow <- workflow() %>% add_recipe(lm_recipe) %>% add_model(lm_model) lm_workflow ``` ``` ══ Workflow ══════════════════════════════════════════════════════════════════════════════════════════════════ Preprocessor: Recipe Model: linear_reg() ── Preprocessor ────────────────────────────────────────────────────────────────────────────────────────────── 1 Recipe Step • step_dummy() ── Model ───────────────────────────────────────────────────────────────────────────────────────────────────── Linear Regression Model Specification (regression) Computational engine: lm ``` ] --- # Define the <mono>workflow</mono> .pull-left45[ <ul> <li class="m1"><span>A <mono>workflow</mono> combines the recipe and model and facilitates fitting the model. To set up a <mono>workflow</mono>:</span></li><br> <ul class="level"> <li><span>Initialize it using the <mono>workflow()</mono> function.</span></li><br> <li><span>Add a recipe using <mono>add_recipe()</mono>.</span></li><br> <li><span>Add a model using <mono>add_model()</mono>.</span></li><br> </ul> </ul> ] .pull-right45[ ```r # set up workflow for logistic regression # model logistic_workflow <- workflow() %>% add_recipe(logistic_recipe) %>% add_model(logistic_model) logistic_workflow ``` ``` ══ Workflow ══════════════════════════════════════════════════════════════════════════════════════════════════ Preprocessor: Recipe Model: logistic_reg() ── Preprocessor ────────────────────────────────────────────────────────────────────────────────────────────── 1 Recipe Step • step_dummy() ── Model ───────────────────────────────────────────────────────────────────────────────────────────────────── Logistic Regression Model Specification (classification) Computational engine: glm ``` ] --- # Fit the <mono>workflow</mono> .pull-left35[ <ul> <li class="m1"><span>A <mono>workflow</mono> is fitted using the <mono>fit()</mono> function.</span></li><br> <ul class="level"> <li><span>Applies the recipe with the pre-processing steps.</span></li><br> <li><span>Run the specified algorithm (i.e., model).</span></li><br> </ul> </ul> ] .pull-right55[ ```r # fit the workflow income_lm <- fit(lm_workflow, data = baselers) tidy(income_lm) ``` ``` # A tibble: 25 × 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) -192. 631. -0.304 7.61e- 1 2 id 0.000895 0.113 0.00792 9.94e- 1 3 age 115. 2.88 40.1 2.23e-208 4 height 4.95 3.02 1.64 1.02e- 1 5 weight 1.01 3.27 0.307 7.59e- 1 6 children -48.9 31.9 -1.54 1.25e- 1 7 happiness -156. 31.1 -5.02 6.00e- 7 8 fitness 6.94 17.9 0.389 6.97e- 1 9 food 2.50 0.142 17.6 2.33e- 60 10 alcohol 26.1 2.47 10.6 8.05e- 25 # … with 15 more rows ``` ] --- # Fit the <mono>workflow</mono> .pull-left35[ <ul> <li class="m1"><span>A <mono>workflow</mono> is fitted using the <mono>fit()</mono> function.</span></li><br> <ul class="level"> <li><span>Applies the recipe with the pre-processing steps.</span></li><br> <li><span>Run the specified algorithm (i.e., model).</span></li><br> </ul> </ul> ] .pull-right55[ ```r # fit the logistic regression workflow eyecor_glm <- fit(logistic_workflow, data = baselers) tidy(eyecor_glm) ``` ``` # A tibble: 25 × 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) -3.04 1.32 -2.31 0.0211 2 id 0.0000834 0.000236 0.354 0.723 3 age 0.00734 0.00973 0.755 0.451 4 height 0.00572 0.00630 0.907 0.364 5 weight 0.00446 0.00678 0.658 0.510 6 income -0.0000395 0.0000666 -0.593 0.553 7 children 0.0329 0.0665 0.495 0.621 8 happiness 0.0386 0.0653 0.591 0.554 9 fitness -0.0419 0.0372 -1.13 0.261 10 food -0.0000755 0.000339 -0.222 0.824 # … with 15 more rows ``` ] --- # Assess model fit .pull-left4[ <ul> <li class="m1"><span>Use <mono>predict()</mono> to obtain model predictions on specified data.</span></li><br> <li class="m2"><span>Use <mono>metrics()</mono> to obtain performance metrics, suited for the current problem mode.</span></li> </ul> ] .pull-right5[ ```r # generate predictions lm_pred <- income_lm %>% predict(baselers) %>% bind_cols(baselers %>% select(income)) metrics(lm_pred, truth = income, estimate = .pred) ``` ``` # A tibble: 3 × 3 .metric .estimator .estimate <chr> <chr> <dbl> 1 rmse standard 1008. 2 rsq standard 0.868 3 mae standard 792. ``` ] --- # Assess model fit .pull-left4[ <ul> <li class="m1"><span>Use <mono>predict()</mono> to obtain model predictions on specified data.</span></li><br> <li class="m2"><span>Use <mono>metrics()</mono> to obtain performance metrics, suited for the current problem mode.</span></li> </ul> ] .pull-right5[ ```r # generate predictions logistic regression logistic_pred <- predict(eyecor_glm, baselers, type = "prob") %>% bind_cols(predict(eyecor_glm, baselers)) %>% bind_cols(baselers %>% select(eyecor)) metrics(logistic_pred, truth = eyecor, estimate = .pred_class, .pred_yes) ``` ``` # A tibble: 4 × 3 .metric .estimator .estimate <chr> <chr> <dbl> 1 accuracy binary 0.647 2 kap binary 0.0566 3 mn_log_loss binary 0.634 4 roc_auc binary 0.605 ``` ] --- # Assess model fit .pull-left4[ <ul> <li class="m1"><span>Use <mono>roc_curve()</mono> to obtain sensitivity and specificity for every unique value of the predicted probabilities.</span></li><br> <ul class="level"> <li><span><high>Sensitivity</high> = Of the truly positive cases, what proportion is classified as positive.</span></li><br> <li><span><high>Specificity</high> = Of the truly negative cases, what proportion is classified as negative.</span></li><br> </ul> <li class="m2"><span>Use <mono>autoplot()</mono> to plot the ROC-curve based on the different combinations of sensitivity and specificity.</span></li> </ul> ] .pull-right5[ ```r # ROC curve for logistic model logistic_pred %>% roc_curve(truth = eyecor, .pred_yes) %>% autoplot() ``` <img src="Fitting_files/figure-html/unnamed-chunk-19-1.png" width="90%" style="display: block; margin: auto;" /> ] --- class: middle, center <h1><a href=https://therbootcamp.github.io/AML_2021AMLD/_sessions/Fitting/Fitting_practical.html>Practical</a></h1>