Fitting

# Fitting
### Applied Machine Learning with R <a href='https://therbootcamp.github.io'> The R Bootcamp @ AMLD </a> <a href='https://therbootcamp.github.io/AML_2021AMLD/'> </a>  <a href='https://therbootcamp.github.io'> </a>  <a href='mailto:therbootcamp@gmail.com'> </a>  <a href='https://www.linkedin.com/company/basel-r-bootcamp/'> </a>
### November 2021

---

<div class="my-footer">
 
 
 <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/>
 
 <a href="https://therbootcamp.github.io/">
 
 
 www.therbootcamp.com
 
 
 </a>
 <a href="https://therbootcamp.github.io/">
 
 Applied Machine Learning with R @ AMLD | November 2021
 
 </a>
 
 </div>

---

# Fitting

<ul>
 <li class="m1">Models are actually <high>families of models</high>, with every parameter combination specifying a different model.</li>
 <li class="m2">To fit a model means to <high>identify</high> from the family of models <high>the specific model that fits the data best</high>.</li>
</ul>

]

<img src="image/curvefits.png" height=480px> 
adapted from <a href="https://www.explainxkcd.com/wiki/index.php/2048:_Curve-Fitting">explainxkcd.com</a>

]

---

# Loss function

<ul>
 <li class="m1">Possible <high>the most important concept</high> in statistics and machine learning.</li>
 <li class="m2">The loss function defines some <high>summary of the errors committed by the model</high>.</li>
</ul>

`$$\Large Loss = f(Error)$$`

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td>
 Purpose
 </td>
 <td>
 Description
 </td>
</tr>
<tr>
 <td bgcolor="white">
 Fitting
 </td>
 <td bgcolor="white">
 Find parameters that minimize loss function.
 </td>
</tr>
<tr>
 <td>
 Evaluation
 </td>
 <td>
 Calculate loss function for fitted model.
 </td>
</tr>
</table>

]

]

---

# Loss function

`$$\Large Loss = f(Error)$$`

]

]
---

<high><h1>Regression</h1></high>

<h1>Decision Trees</h1>

<h1>Random Forests</h1>

---

# Regression

In [regression](https://en.wikipedia.org/wiki/Regression_analysis), the criterion `$Y$` is modeled as the <high>sum</high> of <high>features</high> `$X_1, X_2, ...$` <high>times weights</high> `$\beta_1, \beta_2, ...$` plus `$\beta_0$` the so-called the intercept.

`$$\large \hat{Y} =  \beta_{0} + \beta_{1} \times X_1 + \beta_{2} \times X2 + ...$$`

The weight `$\beta_{i}$` indiciates the <high>amount of change</high> in `$\hat{Y}$` for a change of 1 in `$X_{i}$`.

Ceteris paribus, the <high>more extreme</high> `$\beta_{i}$`, the <high>more important</high> `$X_{i}$` for the prediction of `$Y$` (Note: the scale of `$X_{i}$` matters too!).

If `$\beta_{i} = 0$`, then `$X_{i}$` <high>does not help</high> predicting `$Y$`

]

]

---

# Regression loss

<ul style="margin-bottom:-20px">
 <li class="m1">Mean Squared Error (MSE)
 
 <ul class="level">
 <li>Average <high>squared distance</high> between predictions and true values.</li>
 </ul>
 </li>
</ul>

`$$MSE = \frac{1}{n}\sum_{i \in 1,...,n}(Y_{i} - \hat{Y}_{i})^{2}$$`

<ul>
 <li class="m2">Mean Absolute Error (MAE)
 
 <ul class="level">
 <li>Average <high>absolute distance</high> between predictions and true values.</li>
 </ul>
 </li>
</ul>

$$ MAE = \frac{1}{n}\sum_{i \in 1,...,n} \lvert Y_{i} - \hat{Y}_{i} \rvert$$

]

]

---

# 2 types of supervised problems

<ul style="margin-bottom:-20px">
 <li class="m1">Regression
 
 <ul class="level">
 <li>Regression problems involve the <high>prediction of a quantitative feature</high>.</li>
 <li>E.g., predicting the cholesterol level as a function of age</high>.</li>
 </ul>
 </li> 
 <li class="m2">Classification
 
 <ul class="level">
 <li>Classification problems involve the <high>prediction of a categorical feature</high>.</li>
 <li>E.g., predicting the type of chest pain as a function of age</high>.</li>
 </ul>
 </li>
</ul>

]

]

---

# Logistic regression

<ul style="margin-bottom:-20px">
 <li class="m1">In <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a>, the class criterion <mono>Y &isin; (0,1)</mono> is modeled also as the <high>sum of feature times weights</high>, but with the prediction being transformed using a <high>logistic link function</high>.</li>
</ul>

`$$\large \hat{Y} =  Logistic(\beta_{0} + \beta_{1} \times X_1 + ...)$$`

<ul style="margin-bottom:-20px">
 <li class="m2">The logistic function <high>maps predictions to the range of 0 and 1</high>, the two class values..</li>
</ul>

$$ Logistic(x) = \frac{1}{1+exp(-x)}$$

]

]

---

# Logistic regression

`$$\large \hat{Y} =  Logistic(\beta_{0} + \beta_{1} \times X_1 + ...)$$`

<ul style="margin-bottom:-20px">
 <li class="m2">The logistic function <high>maps predictions to the range of 0 and 1</high>, the two class values..</li>
</ul>

$$ Logistic(x) = \frac{1}{1+exp(-x)}$$

]

]

---

# Classification loss

<ul style="margin-bottom:-20px">
 <li class="m1">Distance
 
 <ul class="level">
 <li>Logloss is <high>used to fit the parameters</high>, alternative distance measures are MSE and MAE.</li>
 </ul>
 </li>
</ul>

`$$\small LogLoss = -\frac{1}{n}\sum_{i}^{n}(log(\hat{y})y+log(1-\hat{y})(1-y))$$`

<ul> 
 <li class="m2">Overlap
 
 <ul class="level">
 <li>Does the <high>predicted class match the actual class</high>. Often preferred for <high>ease of interpretation</high>..</li>
 </ul>
 </li>
</ul>

`$$\small  Loss_{01} = 1-Accuracy = \frac{1}{n}\sum_i^n I(y \neq \lfloor \hat{y} \rceil)$$`

]

]

---

<img src="https://www.tidymodels.org/images/tidymodels.png" width=240px> 
from <a href="https://www.tidymodels.org/packages/">tidymodels.org</a>

---

# Fitting <mono>tidymodels</mono>

<ul>
 <li class="m1">Define the <mono>recipe</mono>.</li> 
 <li class="m2">Define the model.</li> 
 <li class="m3">Define the <mono>workflow</mono>.</li> 
 <li class="m4">Fit the <mono>workflow</mono></li> 
 <li class="m5">Assess model performance.</li>
</ul>

]

]

---

# Define the <mono>recipe</mono>

<ul>
 <li class="m1">The <mono>recipe</mono> specifies two things:</li> 
 <ul class="level">
 <li>The criterion and the features, i.e., the <mono>formula</mono> to use.</li> 
 <li>How the features should be pre-processed before the model fitting.</li> 
 </ul>
 <li class="m2">To set up a <mono>recipe</mono>:</li> 
 <ul class="level">
 <li>Initialize it with <mono>recipe()</mono>, wherein the formula and data are specified.</li> 
 <li>Add pre-processing steps, using <mono>step_*()</mono> functions and <mono>dplyr</mono>-like selectors.</li> 
 </ul>
</ul>

]

```r
# set up recipe for regression model
lm_recipe <- 
 recipe(income ~ ., data = baselers) %>% 
 step_dummy(all_nominal_predictors())

lm_recipe
```

```
Data Recipe

Inputs:

role #variables
   outcome          1
 predictor         19

Operations:

Dummy variables from all_nominal_predictors()
```

]

---

# Define the <mono>recipe</mono>

]

```r
# set up recipe for logistic regression
# model
logistic_recipe <- 
 recipe(eyecor ~., data = baselers) %>% 
 step_dummy(all_nominal_predictors())

logistic_recipe
```

```
Data Recipe

Inputs:

role #variables
   outcome          1
 predictor         19

Operations:

Dummy variables from all_nominal_predictors()
```

]

---

# Define the model

<ul>
 <li class="m1">The model specifies:</li> 
 <ul class="level">
 <li>Which model (e.g. linear regression) to use.</li> 
 <li>Which engine (underlying model-fitting algorithm) to use.</li> 
 <li>The problem mode, i.e., <high>regression vs. classification</high>.</li> 
 </ul>
 <li class="m2">To set up a model:</li> 
 <ul class="level">
 <li>Specify the model, e.g., using <mono>linear_reg()</mono> or <mono>logistic_reg()</mono>.</li> 
 <li>Specify the engine using <mono>set_engine()</mono>.</li> 
 <li>Specify the problem mode using <mono>set_mode()</mono>.</li> 
 </ul>
</ul>

]

```r
# set up model for regression model
lm_model <- 
 linear_reg() %>% 
 set_engine("lm") %>% 
 set_mode("regression")

lm_model
```

```
Linear Regression Model Specification (regression)

Computational engine: lm 
```

]

---

# Define the model

]

```r
# set up model for logistic regression
# model
logistic_model <- 
 logistic_reg() %>% 
 set_engine("glm") %>% 
 set_mode("classification")

logistic_model
```

```
Logistic Regression Model Specification (classification)

Computational engine: glm 
```

]

---

# Define the <mono>workflow</mono>

<ul>
 <li class="m1">A <mono>workflow</mono> combines the recipe and model and facilitates fitting the model. 
To set up a <mono>workflow</mono>:</li> 
 <ul class="level">
 <li>Initialize it using the <mono>workflow()</mono> function.</li> 
 <li>Add a recipe using <mono>add_recipe()</mono>.</li> 
 <li>Add a model using <mono>add_model()</mono>.</li> 
 </ul>
</ul>

]

```r
# set up workflow for regression model
lm_workflow <- 
 workflow() %>% 
 add_recipe(lm_recipe) %>% 
 add_model(lm_model)

lm_workflow
```

```
══ Workflow ══════════════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ──────────────────────────────────────────────────────────────────────────────────────────────
1 Recipe Step

• step_dummy()

── Model ─────────────────────────────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 
```

]

---

# Define the <mono>workflow</mono>

]

```r
# set up workflow for logistic regression
# model
logistic_workflow <- 
 workflow() %>% 
 add_recipe(logistic_recipe) %>% 
 add_model(logistic_model)

logistic_workflow
```

```
══ Workflow ══════════════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: logistic_reg()

• step_dummy()

── Model ─────────────────────────────────────────────────────────────────────────────────────────────────────
Logistic Regression Model Specification (classification)

Computational engine: glm 
```

]

---

# Fit the <mono>workflow</mono>

<ul>
 <li class="m1">A <mono>workflow</mono> is fitted using the <mono>fit()</mono> function.</li> 
 <ul class="level">
 <li>Applies the recipe with the pre-processing steps.</li> 
 <li>Run the specified algorithm (i.e., model).</li> 
 </ul>
</ul>

]

```r
# fit the workflow
income_lm <- fit(lm_workflow,
 data = baselers)

tidy(income_lm)
```

```
# A tibble: 25 × 5
 term estimate std.error statistic p.value
 <chr> <dbl> <dbl> <dbl> <dbl>
 1 (Intercept) -192. 631. -0.304 7.61e- 1
 2 id 0.000895 0.113 0.00792 9.94e- 1
 3 age 115. 2.88 40.1 2.23e-208
 4 height 4.95 3.02 1.64 1.02e- 1
 5 weight 1.01 3.27 0.307 7.59e- 1
 6 children -48.9 31.9 -1.54 1.25e- 1
 7 happiness -156. 31.1 -5.02 6.00e- 7
 8 fitness 6.94 17.9 0.389 6.97e- 1
 9 food 2.50 0.142 17.6 2.33e- 60
10 alcohol 26.1 2.47 10.6 8.05e- 25
# … with 15 more rows
```

]

---

# Fit the <mono>workflow</mono>

]

```r
# fit the logistic regression workflow
eyecor_glm <- fit(logistic_workflow,
 data = baselers)

tidy(eyecor_glm)
```

```
# A tibble: 25 × 5
 term estimate std.error statistic p.value
 <chr> <dbl> <dbl> <dbl> <dbl>
 1 (Intercept) -3.04 1.32 -2.31 0.0211
 2 id 0.0000834 0.000236 0.354 0.723 
 3 age 0.00734 0.00973 0.755 0.451 
 4 height 0.00572 0.00630 0.907 0.364 
 5 weight 0.00446 0.00678 0.658 0.510 
 6 income -0.0000395 0.0000666 -0.593 0.553 
 7 children 0.0329 0.0665 0.495 0.621 
 8 happiness 0.0386 0.0653 0.591 0.554 
 9 fitness -0.0419 0.0372 -1.13 0.261 
10 food -0.0000755 0.000339 -0.222 0.824 
# … with 15 more rows
```

]

---

# Assess model fit

<ul>
 <li class="m1">Use <mono>predict()</mono> to obtain model predictions on specified data.</li> 
 <li class="m2">Use <mono>metrics()</mono> to obtain performance metrics, suited for the current problem mode.</li>
</ul>
]

```r
# generate predictions
lm_pred <-
 income_lm %>% 
 predict(baselers) %>% 
 bind_cols(baselers %>% select(income))

metrics(lm_pred,
        truth = income,
        estimate = .pred)
```

```
# A tibble: 3 × 3
 .metric .estimator .estimate
 <chr> <chr> <dbl>
1 rmse standard 1008. 
2 rsq standard 0.868
3 mae standard 792. 
```

]

---

# Assess model fit

```r
# generate predictions logistic regression
logistic_pred <- 
 predict(eyecor_glm, baselers,
 type = "prob") %>% 
 bind_cols(predict(eyecor_glm, baselers)) %>% 
 bind_cols(baselers %>% select(eyecor))

metrics(logistic_pred,
        truth = eyecor,
        estimate = .pred_class,
        .pred_yes)
```

```
# A tibble: 4 × 3
 .metric .estimator .estimate
 <chr> <chr> <dbl>
1 accuracy binary 0.647 
2 kap binary 0.0566
3 mn_log_loss binary 0.634 
4 roc_auc binary 0.605 
```

]

---

# Assess model fit

<ul>
 <li class="m1">Use <mono>roc_curve()</mono> to obtain sensitivity and specificity for every unique value of the predicted probabilities.</li> 
 <ul class="level">
 <li><high>Sensitivity</high> = Of the truly positive cases, what proportion is classified as positive.</li> 
 <li><high>Specificity</high> = Of the truly negative cases, what proportion is classified as negative.</li> 
 </ul>
 <li class="m2">Use <mono>autoplot()</mono> to plot the ROC-curve based on the different combinations of sensitivity and specificity.</li>
</ul>
]

```r
# ROC curve for logistic model
logistic_pred %>% 
  roc_curve(truth = eyecor, .pred_yes) %>% 
  autoplot()
```

]

---

<h1><a href=https://therbootcamp.github.io/AML_2021AMLD/_sessions/Fitting/Fitting_practical.html>Practical</a></h1>