class: center, middle, inverse, title-slide # Programming in R ### The R Bootcamp
Twitter:
@therbootcamp
### April 2018 --- # ***Haven't we been programming in R for the last three days of the course?*** --- # Content .pull-left45[ Conditional Statements - `if`, `else` - `case_when` Functions - Program your own functions Iteration - `for` loops - `apply` functions - `map` functions ] .pull-right45[ ```r index <- sample(1:nrow(baselers), 1) # Conditional statement if (baselers$sex[index] == "male"){ print(paste("Person with id", baselers$id[index], "is a man")) } else { print(paste("Person with id", baselers$id[index], "is a woman")) } # Define own function my_mean <- function(x){ sum(x) / length(x) } # A for loop for(i in c(1, 2, 3, 4, 5)){ print(i) } ``` ] --- # Conditional Statements Allow you to conditionally execute code blocks. .pull-left45[ *`if`; `else if`; and `else`* `if` lets you check a condition. If this condition is `FALSE` you **can** (but don't have to!) add another check with `else if` that is then, if `TRUE`, executed. An `else` in the end means this code block is always executed if the above checks evaluated to `FALSE`, i.e. it doesn't include a check. ] .pull-right45[ ```r if (this) { # do this } else if (that){ # do that } else { # do something else } ``` ] --- # A Note on Logical Expressions .pull-left45[ The conditional statements often require you to specify checks (logical expressions) which have to evaluate to `TRUE` to execute the following block of code. Note that: - `>`, `<`, `==`, `>=`, and `<=` are vectorized - to prevent errors and warnings due to the vectorization use `any` and `all` ] .pull-right45[ ```r # since == is vectorized # this yields several logicals head(baselers$sex) == "male" ``` ``` ## [1] TRUE TRUE FALSE TRUE TRUE TRUE ``` ```r # any and all summarize these logicals any(head(baselers$sex) == "male") ``` ``` ## [1] TRUE ``` ```r all(head(baselers$sex) == "male") ``` ``` ## [1] FALSE ``` ] --- # Functions - General Structure .pull-left4[ You can define your own functions. This can save a lot of time and space, because by writing your own function you only have to write the whole block of code once. Benefits of writing code in functions: - less typing - fewer errors - can be more general ] .pull-right5[ ```r function_name <- function(arg1, arg2 = "default"){ # function body result <- do_stuff(arg1, arg2) } ``` ] --- # Functions - Return Statement .pull-left45[ The last object in a function code is returned to the global environment. If a function call should be terminated in the middle of the code and some value be returned, the `return()` function has to be used. ] .pull-right45[ ```r my_mean <- function(x){ # check precondition and # if it's not met return NA if (any(is.na(NA) | !is.numeric(x))){ return(NA) } sum(x) / length(x) } ``` ] --- # Functions - Arguments .pull-left45[ Functions can take any object as input to arguments. Since functions themselves are also objects, this means that also functions can be passed as arguments. When calling a function you can use objects or expressions as input to an argument. ] .pull-right45[ ```r # function definition call_fun <- function(dat, fn, cont_nas){ fn(dat, na.rm = cont_nas) } # input a function and additional # statements call_fun(dat = baselers$tattoos, fn = mean, cont_nas = TRUE) ``` ``` ## [1] 1.812 ``` ```r # input expression as argument call_fun(dat = baselers$weight / (baselers$height / 100) ^2, fn = median, cont_nas = TRUE) ``` ``` ## [1] 25.14 ``` ] --- # Functions - Error Messages .pull-left45[ When writing functions you can add checks and if they are not met provide your own error messages. Possible functions are `stop()`, `message()`, `warning()`. - With `stop()`, `message()`, and `warning()` you have to use separate `if` statements to test each potential violation. ] .pull-right45[ ```r # with stop you test whether stopping # condition is met my_mean <- function(x){ if (!is.numeric(x)){ stop("Can only compute mean from numeric variable. Your data was of class", class(x)) } else if (any(is.na(x))){ warning("Return NA because data contains NAs.") } sum(x) / length(x) } ``` ] --- # Iteration - `for` Loops .pull-left35[ A for loop lets you repeat the execution of a block of code as many times as you wish. It also allows you to loop through positions. A `for` loop has three components: - output (object to be prepared before entering the loop) - sequence (defines the vector to loop through) - body (the code that does the actual work) ] .pull-right6[ ```r # output: mean_vec <- vector("double", length = 3) mean_names <- c("age", "weight", "height") for (ind in c(1, 2, 3)) { # sequence mean_vec[ind] <- mean(baselers[[mean_names[ind]]], na.rm = TRUE) # body } # The code to do the same without a loop is mean(baselers[["age"]], na.rm = TRUE) mean(baselers[["weight"]], na.rm = TRUE) mean(baselers[["height"]], na.rm = TRUE) ``` ] --- # Iteration - `while` Loops .pull-left45[ `while` loops let you iterate until a specified condition is met. This can be handy, but also "dangerous" because if the condition is never met, it will just iterate until you manually stop it. ] .pull-right45[ ```r subs_age <- sample(baselers$age, 30) # get only baselers over 40 while(any(subs_age <= 40)){ subs_age <- sample(baselers$age, 30) } # Potentially problematic: # get only baselers over 40 # but with a much larger sample while(any(subs_age <= 40)){ subs_age <- sample(baselers$age, 30000) } ``` ] --- # Iteration - `apply` Family .pull-left45[ Functions of the `apply` family are another tool to iterate through objects and do the same operation on each subset. They are very handy, but not always consistently programmed, i.e., for example the argument structure is not always the same. These are available `apply` functions: - `apply` - `lapply` - `sapply` - `tapply` - `vapply` ] .pull-right5[ ```r # structure: data, function, arguments # lapply: simplest apply version lapply(baselers[, c("age", "weight")], mean, na.rm = TRUE) ``` ``` ## $age ## [1] 44.61 ## ## $weight ## [1] 74.35 ``` ```r # sapply: apply version that # simplifies lapply output sapply(baselers[, c("age", "weight", "height")], mean, na.rm = TRUE) ``` ``` ## age weight height ## 44.61 74.35 171.00 ``` ] --- # Iteration - `apply` Family - Anonymous Functions .pull-left45[ You can specify *anonymous functions* in functions of the `apply` family. An anonymous function exists only temporarily when the `apply` function is called. This means that the anonymous function is not defined in a separate step, but directly within the `apply` function. ] .pull-right5[ ```r # calculate the mean of # age, weight, and height lapply(baselers[, c("age", "weight", "height")], function(x, na_rm){ if (isTRUE(na_rm)){ x <- x[-is.na(x)] } sum(x) / length(x) }, na_rm = TRUE) ``` ] --- # Iteration - `map` Family .pull-left45[ The `map` functions are the purrr equivalent of the `apply` functions. <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/purrr_hex.png" width="60%" style="display: block; margin: auto;" /> ] .pull-right5[ ```r map(baselers[, c("age", "weight", "height")], mean) ``` ``` ## $age ## [1] 44.61 ## ## $weight ## [1] NA ## ## $height ## [1] 171 ``` ] --- # Iteration - `map` Family .pull-left45[ The `map` functions are the purrr equivalent of the `apply` functions. There are several `map` functions that iterate over one vector. They are named after the type of object they return: - `map` returns a list - `map_lgl` returns a logical vector - `map_int` returns an integer vector - `map_dbl` returns a double vector - `map_chr` returns a character vector <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/purrr_hex.png" width="20%" style="display: block; margin: auto;" /> ] .pull-right45[ ```r baselers %>% select(age, weight, height) %>% map_dbl(function(x){ sum(x) / length(x) }) ``` ``` ## age weight height ## 44.61 NA 171.00 ``` ] --- # Iteration - `map` Family .pull-left45[ The `map` functions are the purrr equivalent of the `apply` functions. There are also `map` functions that iterate over two or more vectors. They are named `map2`, and `pmap`. If the objects in the list are named, pmap will use the name, otherwise the position in the list, to determine which object belongs to which argument. <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/purrr_hex.png" width="20%" style="display: block; margin: auto;" /> ] .pull-right45[ ```r x <- baselers[, c("age", "weight")] trim <- c(0, 0.2) map2(x, trim, mean) ``` ``` ## $age ## [1] 44.61 ## ## $weight ## [1] NA ``` ```r na.rm <- c(FALSE, TRUE) pmap(list(x = x, trim = trim, na.rm = na.rm), mean) ``` ``` ## $age ## [1] 44.61 ## ## $weight ## [1] 73.17 ``` ] --- # Summary - What We've Learned - How to use conditional statements - How to program our own functions - How to iterate over blocks of code This is very helpful to not having to type the same code many times and to keep it more flexible. Remember the DRY (Don't Repeat Yourself) principle, i.e. if you have to write the same block of code more than twice, create a function to do so and iterate over it. --- # Interactive session Let's go through some examples together --- # Programming in R Practical <p><font size=6><b><a href="https://therbootcamp.github.io/BaselRBootcamp_2018April/_sessions/D4S1_ProgrammingInR/Programming_in_R_practical.html">Link to practical</a>