class: center, middle, inverse, title-slide # Programming in R ### Basel R Bootcamp
www.therbootcamp.com
@therbootcamp
### July 2018 --- layout: true <div class="my-footer"><span> <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">BaselRBootcamp, July 2018</font></a>                                           <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">www.therbootcamp.com</font></a> </span></div> --- # ***Haven't we been programming in R for the last three days of the course?*** <p align="center"> <img src="http://gifimage.net/wp-content/uploads/2017/10/bruce-almighty-typing-gif-9.gif" height="350"> </p> --- # Content .pull-left45[ Conditional Statements - `if`, `else` Functions - Program your own functions Iteration - `for` loops - `apply` functions - `map` functions ] .pull-right45[ ```r index <- sample(1:nrow(baselers), 1) # Conditional statement if (baselers$sex[index] == "male"){ print(paste("Person with id", baselers$id[index], "is a man")) } else { print(paste("Person with id", baselers$id[index], "is a woman")) } # Define own function my_mean <- function(x) { sum(x) / length(x) } # A for loop for(i in c(1, 2, 3, 4, 5)){ print(i) } ``` ] --- # Conditional Statements Allow you to conditionally execute code blocks. .pull-left45[ *`if`; `else if`; and `else`* <p align="center"> <img src="image/conditional_statement.png"> </p> ] .pull-right45[ ```r if (this) { # do this } else if (that){ # do that } else { # do something else } ``` ] --- # A Note on Logical Expressions .pull-left45[ Conditional statements usually require you to specify **logical expressions**. If they evaluate to `TRUE` the code in the curly brackets is exected. Note that: - `>`, `<`, `==`, `>=`, and `<=` are vectorized - to prevent errors and warnings due to the vectorization use `any()` and `all()` ```r # since == is vectorized # this yields several logicals head(baselers$sex) == "male" ``` ``` ## [1] TRUE TRUE FALSE TRUE TRUE TRUE ``` ] .pull-right45[ ```r # any and all summarize these logicals any(head(baselers$sex) == "male") ``` ``` ## [1] TRUE ``` ```r all(head(baselers$sex) == "male") ``` ``` ## [1] FALSE ``` ```r if (any(head(baselers$sex) == "male")){ print("Sample contains at least 1 male") } ``` ``` ## [1] "Sample contains at least 1 male" ``` ] --- # Functions - General Structure .pull-left4[ You can define your own functions in R. You only have to specify: - a function name - arguments (and their defaults) - the function body Benefits of writing code in functions: - less typing - fewer errors - can be more general ] .pull-right5[ ```r function_name <- function(arg1, arg2 = "default"){ # function body result <- do_stuff(arg1, arg2) } ``` ] --- # Functions - Return Statement .pull-left45[ Use `return()` to exit a function early, i.e. before the final line of its definition. ] .pull-right45[ ```r my_mean <- function(x){ # check precondition and # if it's not met return NA if (!is.numeric(x) | length(x) == 0){ return(NA) } sum(x) / length(x) } ``` ] --- # Functions - Arguments .pull-left45[ Functions can take any object as input to arguments. This means also functions or expressions can be passed as arguments. ] .pull-right45[ ```r # function definition call_fun <- function(dat, fn, cont_nas){ fn(dat, na.rm = cont_nas) } # input a function and additional # statements call_fun(dat = baselers$tattoos, fn = mean, cont_nas = TRUE) ``` ``` ## [1] 1.812 ``` ```r # input expression as argument call_fun(dat = baselers$weight / (baselers$height / 100) ^2, fn = median, cont_nas = TRUE) ``` ``` ## [1] 25.14 ``` ] --- # Functions - Error Messages .pull-left45[ You can define your own errors, warnings or messages. To do this use `stop()`, `warning()`, `message()`. - With `stop()`, `warning()`, and `message()` you have to use separate `if` statements if you want to only use them in specific conditions. ] .pull-right45[ ```r # with stop you test whether stopping # condition is met my_mean <- function(x){ if (!is.numeric(x)){ stop("Can only compute mean from numeric variable. Your data was of class", class(x)) } else if (any(is.na(x))){ warning("Return NA because data contains NAs.") } sum(x) / length(x) } ``` ] --- # Example - `show_stats()` .pull-left55[ Let's create a function called `show_stats()` which takes a numeric vector as an argument, and returns a sentence with information about that vector ```r show_stats <- function(x, digits = 2) { if(!is.numeric(x)) { stop("x is not numeric!!!")} output <- paste0("x has length ", length(x), ", and a mean of ", round(mean(x), digits)) output } ``` ] .pull-right4[ ```r # Test with a numeric vector show_stats(x = baselers$age) ``` ``` ## [1] "x has length 10000, and a mean of 44.61" ``` ```r # Text with a characteer vector show_stats(x = baselers$sex) ``` <font color = 'red'>Error in show_stats(x = "character!") : x is not numeric!!!</font> ] --- # Iteration - `for` Loops .pull-left35[ A `for` loop lets you **repeat the execution of a block of code** as many times as you wish. It also allows you to loop through positions. ] .pull-right6[ <p align="center"> <img src="image/loop.gif"> </p> ] --- # Iteration - `for` Loops .pull-left35[ A for loop lets you repeat the execution of a block of code as many times as you wish. It also allows you to loop through positions. A `for` loop has three components: - **output** (object to be prepared before entering the loop) - **sequence** (defines the vector to loop through) - **body** (the code that does the actual work) ] .pull-right6[ ```r # output: mean_vec <- vector("double", length = 3) mean_names <- c("age", "weight", "height") for (ind in c(1, 2, 3)) { # sequence mean_vec[ind] <- mean(baselers[[mean_names[ind]]], na.rm = TRUE) # body } # The code to do the same without a loop is mean(baselers[["age"]], na.rm = TRUE) mean(baselers[["weight"]], na.rm = TRUE) mean(baselers[["height"]], na.rm = TRUE) ``` ] --- # Iteration - `while` Loops .pull-left45[ `while` loops let you iterate until a specified condition is met. This can be handy, but also "dangerous" because if the condition is never met, it will just iterate until you manually stop it. ] .pull-right45[ ```r subs_age <- sample(baselers$age, 30) # get only baselers over 40 while(any(subs_age <= 40)){ subs_age <- sample(baselers$age, 30) } # Potentially problematic: # get only baselers over 40 # but with a much larger sample while(any(subs_age <= 40)){ subs_age <- sample(baselers$age, 30000) } ``` ] --- # Iteration - `apply` Family .pull-left45[ Functions of the `apply` family are another tool to iterate through objects and do the same operation on each subset. They are very handy, but not always consistently programmed, i.e., for example the argument structure is not always the same. These are available `apply` functions: - `apply` - `lapply` - `sapply` - `tapply` - `vapply` ] .pull-right5[ ```r # structure: data, function, arguments # lapply: simplest apply version lapply(baselers[, c("age", "weight")], mean, na.rm = TRUE) ``` ``` ## $age ## [1] 44.61 ## ## $weight ## [1] 74.35 ``` ```r # sapply: apply version that # simplifies lapply output sapply(baselers[, c("age", "weight", "height")], mean, na.rm = TRUE) ``` ``` ## age weight height ## 44.61 74.35 171.00 ``` ] --- # Iteration - `apply` Family - Anonymous Functions .pull-left45[ You can specify **anonymous functions** in functions of the `apply` family. An anonymous function exists only temporarily when the `apply` function is called. This means that the anonymous function is not defined in a separate step, but directly within the `apply` function. ] .pull-right5[ ```r # calculate the mean of # age, weight, and height lapply(baselers[, c("age", "weight", "height")], function(x, na_rm){ if (isTRUE(na_rm)){ x <- x[!is.na(x)] } sum(x) / length(x) }, na_rm = TRUE) ``` ``` ## $age ## [1] 44.61 ## ## $weight ## [1] 74.35 ## ## $height ## [1] 171 ``` ] --- # Iteration - `map` Family .pull-left45[ The `map` functions are the **purrr** equivalent of the `apply` functions. <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/purrr_hex.png" width="60%" style="display: block; margin: auto;" /> ] .pull-right5[ ```r map(baselers[, c("age", "weight", "height")], mean) ``` ``` ## $age ## [1] 44.61 ## ## $weight ## [1] NA ## ## $height ## [1] 171 ``` ] --- # Iteration - `map` Family .pull-left45[ The `map` functions are the **purrr** equivalent of the `apply` functions. There are several `map` functions that iterate over one vector. They are named after the type of object they return: - `map` returns a list - `map_lgl` returns a logical vector - `map_int` returns an integer vector - `map_dbl` returns a double vector - `map_chr` returns a character vector <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/purrr_hex.png" width="20%" style="display: block; margin: auto;" /> ] .pull-right45[ ```r # Print the mean of several columns baselers %>% select(age, weight, height) %>% map_dbl(mean, na.rm = TRUE) ``` ``` ## age weight height ## 44.61 74.35 171.00 ``` ] --- # Summary - What We've Learned - How to use conditional statements - How to program our own functions - How to iterate over blocks of code This is very helpful to not having to type the same code many times and to keep it more flexible. Remember the DRY (Don't Repeat Yourself) principle, i.e. if you have to write the same block of code more than twice, create a function to do so and iterate over it. --- # Programming in R Practical <p><font size=6><b><a href="https://therbootcamp.github.io/BaselRBootcamp_2018July/_sessions/Programming/Programming_practical.html">Link to practical</a>