Recap

# Recap
### Machine Learning with R <a href='https://therbootcamp.github.io'> Basel R Bootcamp </a> <a href='https://therbootcamp.github.io/ML_2019Oct/'> </a>  <a href='https://therbootcamp.github.io'> </a>  <a href='mailto:therbootcamp@gmail.com'> </a>  <a href='https://www.linkedin.com/company/basel-r-bootcamp/'> </a>
### October 2019

---

<div class="my-footer">
 
 
 <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/>
 
 <a href="https://therbootcamp.github.io/">
 
 
 www.therbootcamp.com
 
 
 </a>
 <a href="https://therbootcamp.github.io/">
 
 Machine Learning with R | October 2019
 
 </a>
 
 </div>

---

# What is machine learning?

Machine learning is...

...a <high>field of artificial intelligence</high>...

...that uses <high>statistical techniques</high>...

...to allow computer systems to <high>"learn"</high>,...

...i.e., to progressively <high>improve performance</high> on a specific task...

...from small or large amounts of <high>data</high>,...

....<high>without being explicitly programmed</high>....

....with the goal to <high>discover structure</high> or </high>improve decision making and predictions</high>.

]

<img src="image/ml_robot.jpg" height=380px> 
from <a href="https://medium.com/@dkwok94/machine-learning-for-my-grandma-ca242e97ef62">medium.com</a>

]

---

# Types of machine learning tasks

There are many types of machine learning tasks, each of which call for different models.

<high>We will focus on supervised machine learning</high>.

]

<img src="image/mltypes.png" height=500px> 
from <a href="image/mltypes.png">amazonaws.com</a>

]

---

# Loss function

Possible <high>the most important concept</high> in statistics and machine learning.

The loss function defines some <high>summary of the errors committed by the model</high>.

`$$\Large Loss = f(Error)$$`

Two purposes

<table style="cellspacing:0; cellpadding:0; border:none;">
<tr>
 <td>
 Purpose
 </td>
 <td>
 Description
 </td>
</tr>
<tr>
 <td bgcolor="white">
 Fitting
 </td>
 <td bgcolor="white">
 Find parameters that minimize loss function.
 </td>
</tr>
<tr>
 <td>
 Evaluation
 </td>
 <td>
 Calculate loss function for fitted model.
 </td>
</tr>
</table>

]

]

---

# 2 types of supervised problems

There are two types of supervised learning problems that can often be approached using the same model.

Regression

Regression problems involve the <high>prediction of a quantitative feature</high>.

E.g., predicting the cholesterol level as a function of age.

Classification

Classification problems involve the <high>prediction of a categorical feature</high>.

E.g., predicting the type of chest pain as a function of age.

]

]

---

# 3 key (supervised) models

---

# Hold-out data

Model performance must be evaluated as true prediction on an <high>unseen data set</high>.

The unseen data set can be <high>naturally</high> occurring, e.g., using 2019 stock prizes to evaluate a model fit using 2018 stock prizes.

More commonly unseen data is created by splitting the available data into a training set and a test set.

]

]

---

# Overfitting

Occurs when a model <high>fits data too closely</high> and therefore <high>fails to reliably predict</high> future observations.

In other words, overfitting occurs when a model <high>'mistakes' random noise for a predictable signal</high>.

More <high>complex models</high> are more <high>prone to overfitting</high>.

]

]

---

# 7 steps with <mono>caret</mono>

Step 0: Load data

```r
data <- read_csv("1_Data/data.csv")
```

Step 1: split into training and test data

```r
# Create index
ind <- createDataPartition(y = data$criterion,
 p = .8, list = FALSE)

# Create training and test data data
data_train <- baselers %>% slice(ind)
data_test <- baselers %>% slice(-ind)
```

Step 2: Define control parameters

```r
# Use method = "none" for now
ctrl <- trainControl(method = "none")
```

]

Step 3: Train model

```r
mod <- train(form = Y ~ ., 
 data = data_train,
 method = "My Favorite Model",
 trControl = ctrl)
```

Step 4: Explore

```r
mod            # Print object
mod$finalModel # Final model
```

Step 5: Predict

```r
# Evaluate fitting performance
mod_pred <- predict(object = mod, 
 newdata = data_test)
```

Step 6: Evaluate prediction accuracy

```r
# Evaluate prediction performance
postResample(pred = mod_pred, 
             obs = data_test$Y)
```

]

---

<h1><a href=https://therbootcamp.github.io/ML_2019Oct/index.html>Schedule</a></h1>