Slides

Here are the introduction slides for this practical on R programming!

Overview

In this practical you’ll practice “Programming in R” with the base and purrr-package functions. Although we sort of have been programming already, we always used the tidyverse way to do it. While there is nothing against doing it this way, in fact we chose to teach R with the tidyverse because we think that it’s an easy and very elegant way of doing it, when you work with R in the future, you will almost certainly find yourself in situations were you have to understand the base R way of programming. This may, for example, be the case when you get code from someone, or work with someone who is not familiar with the tidyverse. Furthermore, the concepts you learn in this session are the same in most programming languages (of course the exact implementation will vary over languages). So if R is your first programming language, learning about and understanding these concepts will greatly facilitate the understanding and speed of learning of other programming languages. Last but no least, the concepts we touch upon here taken together will allow you to code more efficiently and elegantly, which may save you quite some time. But enough said; let’s get started!

By the end of this practical you will know how to:

Use conditional statements (if, else if, and else)
Program your own functions
Include checks and error messages in your functions to make them robust
Iterate through objects and execute functions on subsets of data (for- and while loops, apply family, map family)
Combine the above concepts

Cheatsheets

Programming in R with the base R cheatsheet: https://www.rstudio.com/wp-content/uploads/2016/05/base-r.pdf.
You can get the purrr cheatsheet from: https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf.

Tasks

Getting setup

A. Open a new R script and save it as a new file called programming_in_R.R. At the top of the script, using comments, write your name and the date. Then, load the tidyverse, yarrr and sendmailR packages. Here’s how the top of your script should look:

## NAME
## DATE
## Programming in R Practical

library(tidyverse)     # For tidyverse
library(yarrr)         # For yarrr
library(sendmailR)     # For sendmailR

B. For this practical, we’ll sometimes use the pirates data from the yarrr package. Load this dataset from your local data folder. We will also quite often use little datasets which you will create on the fly during the tasks. This should help making the examples very clear, and give you some experience in how to create own data to test your code.

# Load data from your local data file of your R project

pirates <- read_csv("data/pirates.csv")

C. The pirates dataset contains details about 1000 pirates, such as their age, sex, number of tattoos and beard length. For more information about the dataset just type the following code:

# Look at documentation for pirates data (contained in the yarrr package)
library(yarrr)
?pirates

D. First look at the first few rows of the data (Tip: use head).

# Check first rows of the pirates data frame
head(pirates)

  id    sex age height weight headband college tattoos tchests parrots
1  1   male  28 173.11   70.5      yes   JSSFP       9       0       0
2  2   male  31 209.25  105.6      yes   JSSFP       9      11       0
3  3   male  26 169.95   77.1      yes    CCCC      10      10       1
4  4 female  31 144.29   58.5       no   JSSFP       2       0       2
5  5 female  41 157.85   58.4      yes   JSSFP       9       6       4
6  6   male  26 190.20   85.4      yes    CCCC       7      19       0
  favorite.pirate sword.type eyepatch sword.time beard.length
1    Jack Sparrow    cutlass        1       0.58           16
2    Jack Sparrow    cutlass        0       1.11           21
3    Jack Sparrow    cutlass        1       1.44           19
4    Jack Sparrow   scimitar        1      36.11            2
5            Hook    cutlass        1       0.11            0
6    Jack Sparrow    cutlass        1       0.59           17
            fav.pixar grogg
1      Monsters, Inc.    11
2              WALL-E     9
3          Inside Out     7
4          Inside Out     9
5          Inside Out    14
6 Monsters University     7

Conditional Statements

You have already used conditional statements, mostly in the form of the case_when() function. Now let’s look at other functions. Note that for now the example may seem a little arbitrary. That is because usually conditional statements are used in loops or functions. We’ll bring everything together towards the end of this practical.

Let’s simulate coin tosses. Use the sample function to draw one sample from {H, T} (i.e. a set, or in R a vector, containing “H” for head, and “T” for tail). If “H” comes up, print “Head”, if “T” comes up, print “Tail”.

coin_toss <- sample(c("H", "T"), 1)

if (coin_toss == "H"){
  print("Head")
} else {
  print("Tail")
}

[1] "Head"

Now you no longer want to toss your coin only once but several (let’s say ten) times (you’ll have to tweak the sample function a bit. Check the help menue if you don’t know how to do this). Do this, and if ALL tosses came up heads, print “Wow, that’s ten heads in a row!”, if there is at least one head in there, print “The coin showed X times heads, and Y times tails”, where X and Y are the respective numbers of heads and tails (use the paste function to print and the sum function to get the correct numbers, check the help menus if you don’t know how to do this), and print “Wow, that’s ten tails in a row!”, if ALL tosses came up tails.

coin_tosses <- sample(c("H", "T"), 10, replace = TRUE)

if (all(coin_tosses == "H")){
  print("Wow, that's ten heads in a row!")
} else if (any(coin_tosses == "H")){ # This works because if-else statements
                                     # are executed in the order they are written,
                                     # so because of the first check it is safe
                                     # to assume, that not all are heads.
  paste("The coin showed", sum(coin_tosses == "H"), "times heads, and",
        sum(coin_tosses == "T"), "times tails.")
} else {
  print("Wow, that's ten tails in a row!")
}

[1] "The coin showed 5 times heads, and 5 times tails."

Functions

Most of the time, conditional statements are either used in combination with functions or loops. We’ll start by looking at functions. Remember that a function has two parts, one in which you specify the arguments, and one in which you then write the code that is executed. The last line that is executed determines what the function returns. You can make this explicit with a return() statement, but it is good practice to only use this if you want to exit the function early, i.e. before all code is executed. Here is an example:

## Create a function

# First arguments are specified
weighted_mean <- function(vls,   # has no default. This means it MUST be
                                 # specified in a function call.
                          wghts = 1 / length(vls)){ # Has a default, i.e.
                                                    # it could be left un-
                                                    # specified and would then
                                                    # take the default.
  
  if (!is.numeric(vls) || !is.numeric(wghts)){ # Check whether conditions are met
                                               # and throw an error if they aren't
    stop("Arguments must be numeric.")
  }
  
  if (sum(wghts) == 0){
    return(0) # if the sum of the weights is 0, in this case we want it to return
              # 0. If this condition is TRUE, all following code is NOT executed,
              # that is the function will be exited early.
  }
  
  # Code block to be executed if the function is called. The result of the 
  # last line is returned
  sum(values * weights) / sum(weights)
}

Write your own function named check_logical that tests whether a given logical object is TRUE or FALSE, and returns “Argument is TRUE”, and “Argument is FALSE”, respectively.

check_logical <- function(x){
  if (isTRUE(x)){
    print("Argument is TRUE")
  }
  
  if (isFALSE(x)){
    print("Argument is FALSE")
  }
}

Use your check_logical function to see whether ALL pirates are larger than 1.6 meters.
One important principle when programming is the Don’t Repeat Yourself (DRY) principle. That is, if you have written a block of code several times, with only slight changes in it, it is time to write a function for this. For example:

# Compute the absolute difference between means of a variable
# separated by another variable

# for tattoos
abs(mean(pirates$tattoos[pirates$sex == "male"]) - 
    mean(pirates$tattoos[pirates$sex == "female"]))

[1] 0.003659395

# for age
abs(mean(pirates$age[pirates$sex == "male"]) - 
    mean(pirates$age[pirates$sex == "female"]))

[1] 4.955067

# for height
abs(mean(pirates$height[pirates$sex == "male"]) - 
    mean(pirates$height[pirates$sex == "female"]))

[1] 13.93597

Each time you copy the code and change a small part, there is a small chance of making a mistake, for example by forgetting to change the name of one variable. Also imagine having coded the above example for 10 variables and then you realize that you should have taken the male and other group together to see the mean difference between this joint group and females. Then you would have to go through each line again to manipulate it.

Now write a function called abs_mean_diff, that computes the absolute mean difference between two groups. It should take a vector mean_vec with data to compute the mean on, a vector cat_vec with data to separate the first, and a list sep_list with two vectors of categories to separate for (note that when you want to check whether a number of values also occurs in another vector, you may want to use %in% instead of == for this check). Then use this function to compute the same mean differences as in the code above.

abs_mean_diff <- function(mean_vec, cat_vec, sep_list){
  abs(mean(mean_vec[cat_vec %in% sep_list[[1]]]) -
    mean(mean_vec[cat_vec %in% sep_list[[2]]]))
}

abs_mean_diff(pirates$tattoos, pirates$sex, list("male", "female"))

[1] 0.003659395

abs_mean_diff(pirates$age, pirates$sex, list("male", "female"))

[1] 4.955067

abs_mean_diff(pirates$height, pirates$sex, list("male", "female"))

[1] 13.93597

One great thing about functions is that they can take other functions as arguments. Let’s say you don’t only want to be able to compute the abs mean difference, but also the median difference. Reprogram the abs_mean_diff function from task 5 by adding a fourth argument met (short for method), where you pass the function name as an argument WITHOUT quotationmarks. Set the default value to this argument to mean. Name the function abs_diff. Then compute the mean and the median absolute difference of tattoos for male and female pirates.

abs_diff <- function(mean_vec, cat_vec, sep_list, met = mean){
  abs(met(mean_vec[cat_vec %in% sep_list[[1]]]) -
    met(mean_vec[cat_vec %in% sep_list[[2]]]))
}

abs_diff(pirates$tattoos, pirates$sex, list("male", "female"))

[1] 0.003659395

abs_diff(pirates$tattoos, pirates$sex, list("male", "female"), median)

[1] 1

Passing functions as arguments can be very handy, especially when you can combine this with apply or map function so you can enter a list of functions.

There is one argument you can use when programming a function, that let’s you then enter an unlimited number of arguments to that function, without having to prespecify them. The argument is called .... Now add the ... as last argument in the abs_diff function definition from the previous task, and also add it to the met() function inside the abs_diff function. Now you can call functions with abs_diff that take more than just the numeric vector as argument, for example you can compute a trimmed mean with mean(x, trim = .2) for the 20% trimmed mean. Compute the differences of the 20% trimmed mean with the modified abs_diff() function (i.e. the outer 20% of data on each side are discarded, which avoids outliers).

abs_diff <- function(mean_vec, cat_vec, sep_list, met = mean, ...){
  abs(met(mean_vec[cat_vec %in% sep_list[[1]]], ...) -
    met(mean_vec[cat_vec %in% sep_list[[2]]], ...))
}

abs_diff(pirates$tattoos, pirates$sex, list("male", "female"), mean, trim = .2)

[1] 0.1333333

Iteration

Iteration is an extremely important feature in programming. It let’s you execute a code block for a large number of objects, with only having to write the code block once. The iteration methods we will look at in the next few tasks are for and while loops. Later we will also look at apply and map functions.

Remember that a for loop executes a block of code as many times as you specify, whereas a while loop does the same thing but does not stop until a prespecified condition is met. Here are a few examples which you can copy and paste to see how they work:

## for loops

# example 1:
# the following three loops do exactly the same. You may encounter all three
# ways of specifying the loops, but the latter two are more likely.
for (i in c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)){
  print(i) # note that within loops print statements have to be used explicitly
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

for (i in 1:10){
  print(i) # note that within loops print statements have to be used explicitly
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

for (i in seq_len(10)){
  print(i) # note that within loops print statements have to be used explicitly
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

# By the way, here's the difference between seq_len, and seq_along:
seq_len(5) # creates a vector from 1 to the number specified

[1] 1 2 3 4 5

seq_along(letters[1:5]) # creates a numeric sequence of the positions of the specified

[1] 1 2 3 4 5

                        # vector

# you could also use 1:5, instead of seq_len(5), but note that while 
x <- vector("double", length = 0)
1:length(x)

[1] 1 0

# yields probably not the behavior you'd want in a loop, while
seq_along(x)

integer(0)

# will lead to the desired behavior

# example 2:
x <- vector("double", length = 10) # Output part

for (i in seq_along(x)){           # Sequence part
  x[i] <- i                        # Body part
}

x

 [1]  1  2  3  4  5  6  7  8  9 10

# example 3:
## while loops
i <- 1
while(i < 5){
  
  print(i)
  
  i <- i + 1
}

[1] 1
[1] 2
[1] 3
[1] 4

In for loops you cannot only loop over sequences of numbers, but over any values of a vector, that are then used in the loop. To do this you simply write for (val in vector){}, instead of using either 1: length(vector), or seq_along(vector) (did you see that we used val instead of i when defining the sequence in the loop. You can enter any name you want there, just make sure to refer to this name in the body of the loop). Loop over the following vector containing variable names of the pirates dataset c("age", "height", "tattoos", "beard.length"), and compute the mean of each variable and store the value in a named vector (i.e. each value should be named to identify it, to do this use names) called pirate_means.

vars <- c("age", "height", "tattoos", "beard.length")

pirate_means <- vector("double", length(vars))
names(pirate_means) <- vars

for (nms in vars){
  
  pirate_means[nms] <- mean(pirates[, nms])
  
}

pirate_means

         age       height      tattoos beard.length 
     27.3600     170.2264       9.4290      10.3840

Use a loop to rewrite the code where you called abs_diff multiple times in task 7. Remember that in a loop you have to use print explicitly.

vars <- c("age", "height", "tattoos")
for (v in vars){
  print(abs_diff(pirates[[v]], pirates$sex,
                 list("male", "female")))
}

[1] 4.955067
[1] 13.93597
[1] 0.003659395

You want to check whether the college a pirate went to, is related to the number of groggs a pirate drinks a day on average. To get rid of the potentially influential factors of age, sex, and weight, you want to match the two subsamples of 100 pirates each, such that you have a maximum difference in the number of females between the two groups of 12, a maximum mean age difference of 10 years, a maximum difference in the standard deviatio of age of 8 years, a maximum mean weight difference of 8 kilogramms, and a maximum difference in the standard deviation of weight of 7 kilogramms. Use a while loop where you take a subsample of 100 pirates from each college, and use the above parameters of this sample for the check. Note: don’t bother about doing the analysis afterwards, the exercise is just about the looping part. Also the difference values for the check are arbitrary. If it takes too long for the loop to run just use larger values, for the exercise the precision doesn’t have to be high.

sex_diff <- Inf
age_diff_mean <- Inf
age_diff_sd <- Inf
weight_diff_mean <- Inf
weight_diff_sd <- Inf

while (sex_diff > 12 || age_diff_mean > 10 || age_diff_sd > 8 ||
       weight_diff_mean > 8 || weight_diff_sd > 7){
  
  CCCC_pirates <- pirates %>%
                    filter(college == "CCCC") %>%
                    sample_n(100, replace = FALSE)
  
  JSSFP_pirates <- pirates %>%
                    filter(college == "JSSFP") %>%
                    sample_n(100, replace = FALSE)
  
  sex_diff <- abs(sum(CCCC_pirates$sex == "female") -
                    sum(JSSFP_pirates$sex == "female"))
  
  age_diff_mean <- abs(mean(CCCC_pirates$age) - mean(JSSFP_pirates$age))
  age_diff_sd <- abs(sd(CCCC_pirates$age) - sd(JSSFP_pirates$age))
  
  weight_diff_mean <- abs(mean(CCCC_pirates$weight) - mean(JSSFP_pirates$weight))
  weight_diff_sd <- abs(sd(CCCC_pirates$weight) - sd(JSSFP_pirates$weight))
}


# Here you would proceed with the analysis

`apply` and `map` functions

The apply and map function families are quite similar. The map functions from the purrr package were introduced as a more consistent and less ambiguous alternative to the apply functions. Both function families let you iterate over objects and call functions on the subsets in each iteration. So basically you can use these functions instead of loops. In general, the basic structure is that you first specify the object over which you want to iterate, then the function to call in each iteration and then additional arguments that function might take. Depending on which of the functions you use, there will be additional arguments to specify. For example:

## apply

# create a vector of variable names from the variables you want to call the same
# function on

vars <- c("age", "height", "weight", "tattoos", "parrots", "grogg")

# compute the mean of these variables. Note that the second argument specifies,
# whether the mean should be taken over columns (2), or rows (1)
apply(pirates[, vars], 2, mean)

     age   height   weight  tattoos  parrots    grogg 
 27.3600 170.2264  69.7446   9.4290   2.8190  10.1450

# you can also use a simpler apply version called sapply, where it automatically
# computes the mean column wise
sapply(pirates[, vars], mean)

     age   height   weight  tattoos  parrots    grogg 
 27.3600 170.2264  69.7446   9.4290   2.8190  10.1450

## map

# the map equivalent is
map_dbl(pirates[, vars], mean)

     age   height   weight  tattoos  parrots    grogg 
 27.3600 170.2264  69.7446   9.4290   2.8190  10.1450

# remember that in the map functions the suffix (in this case _dbl) stands for the
# type of vector that should be returned. If this doesn't match, e.g. if a character
# vector is created in map_dbl, it will throw an error. map without suffix will
# return a list.

Remember the for loop you’ve used in task 9, to run the abs_diff function several times? Rewrite this by replacing the for loop once with sapply and once with a map function (figure out which map function would fit best here).

sapply(pirates[,c("age", "height", "tattoos")], abs_diff, cat_vec =pirates$sex,
       sep_list = list("male", "female"))

         age       height      tattoos 
 4.955066854 13.935971147  0.003659395

map_dbl(pirates[,c("age", "height", "tattoos")], abs_diff, cat_vec =pirates$sex,
       sep_list = list("male", "female"))

         age       height      tattoos 
 4.955066854 13.935971147  0.003659395

Challenge. Now there would be a more elegant way of programming the abs_diff function such that you don’t have to use loops or apply/ map functions outside of it. Redefine the abs_diff function and use map_dbl, such that you can enter multiple columns to the mean_vec argument, instead of only one. For this you can use an anonymous function in map_dbl. This means that the function is only defined inside the map_dbl call and does not exist outside of it. The syntax for a self programmed anonymous mean function would be map_dbl(data, function(x){sum(x) / length(x)}). That is instead of passing a function name to map_dbl, you pass a function definition function(argument){do something}. Remember that you can pass functions as arguments.

abs_diff <- function(mean_vec, cat_vec, sep_list, met = mean, ...){
  map_dbl(mean_vec, function(x, cat_vec, sep_list, ...){
    abs(met(x[cat_vec %in% sep_list[[1]]], ...) - 
          met(x[cat_vec %in% sep_list[[2]]], ...))
  }, cat_vec = cat_vec, sep_list = sep_list, ... = ...)
}


abs_diff(pirates[,c("age", "height", "tattoos")], pirates$sex,
         list("male", "female"))

         age       height      tattoos 
 4.955066854 13.935971147  0.003659395

There is another map function called map2 (again with the suffixes for the different types of vectors returned), that let’s you iterate over two objects simultaneously. Use the appropriate map2 function and to draw 10 samples from three normal distributions with means 5, 9, 12, and sds, 4, 2, 7, and save them as an object (note: the function to draw samples from a normal distribution is rnorm(n, mean, sd)).

m_vec <- c(5, 9, 12)
sd_vec <- c(4, 2, 7)

smpls <- map2(m_vec, sd_vec, rnorm, n = 10)
smpls

[[1]]
 [1] 10.721255  5.089893  5.660529  4.467027  6.739825  3.882055  3.268400
 [8]  3.029077 10.966947  7.483107

[[2]]
 [1] 10.124690 12.661301  8.018206 10.286948  7.475588  6.892886  9.545238
 [8]  7.860845  8.751900 10.458574

[[3]]
 [1] 15.713178 15.559643 10.609248  6.903379 15.141491 23.674684  5.535590
 [8] 11.987268 11.249254 18.373000

Now use map to plot each of the samples drawn. Note: use par(mfrow = c(1, 3)) to draw three plots in a line in the same canvas (i.e., the 1 determines the number of rows of plots and the 3 determines the number of columns of plots. This way you can create multiple-plot panels).

par(mfrow = c(1, 3))
map(smpls, plot, type = "b", ylab = "Random Sample")

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

As you can see, besides the plots you’ve got NULL printed in your console each time you called plot. This is because map returns whatever the function called returns and in case of plot NULL is created as output (besides the actual plot that is drawn). To take care of this problem, there is a special family of purrr functions called walk, that are used for functions like plot that create something on the fly and only have a console output as undesired sideproduct. Create the plots again using walk() instead of map().

par(mfrow = c(1, 3))
walk(smpls, plot, type = "b", ylab = "Random Sample")

Redo task 13 but this time don’t draw 10 samples per distribution but draw 10, 15, and 20 samples, respectively. To iterate over an unlimited number of objects simultaneously, use the pmap function. Check the help menue if you’re not sure how this works.

arg <-list(
  "mean" = c(5, 9, 12),
  "sd" = c(4, 2, 7),
  "n" = c(10, 15, 20)
)

smpls <- pmap(arg, rnorm)
smpls

[[1]]
 [1]  7.0837714  0.5065329  2.9231527  5.9301707  3.6554127 10.6073204
 [7]  7.4820223  7.6139097  4.6968049  7.1811328

[[2]]
 [1]  9.907191 13.970487  7.918319 10.453861  8.922641  8.969958  8.624143
 [8]  5.869758 10.356857 13.059174  8.597373  5.192553 10.998464  6.612971
[15]  7.902673

[[3]]
 [1] 12.5890198  4.1316471  1.7613505 25.4486423  5.8470031 14.3689393
 [7]  3.8196101 15.2551614  0.6634309 11.0219258  2.2166318 20.1769668
[13] 24.9583361 20.2261988 13.4490028 20.8327034 18.5705800  9.1042740
[19] 18.5305566 16.6213628

Bringing it all together - Challenge

The next two tasks contain elements of all parts we’ve covered in this lesson. You will write a function to do a short simulation. Briefly some background: In statistics, we often accept a hypothesis if the p-value, i.e. the probability of the found-, or more extreme data under the H0 hypothesis that assumes that there is no difference between two distributions, is smaller or equal to 5%. This is tested with statistical tests, of which we will use the t-test (r function t.test(group1, group2)). We will draw two random samples from the same distribution and run a t.test to check whether they are significantly different. If they are not, we will resample from the distribution and run the check once more. We will then see how strong the inflation of the alpha error is, with one resample of size x.

We will also encounter a new cool thing: you can send mails from R. For example, if you are running a simulation that takes a lot of time and you don’t want to check again and again whether it’s completed. So you can tell R to send you a mail when it’s done, and if you like, even attach the data or some core output. The package to do this is sendmailR, with the function sendmail.

Program a function called sim_false_positive, that takes the following arguments: n (the number of samples initially drawn per distribution), m (the mean of the distribution from which samples are drawn), sd (the standard deviation of the distribution from which samples are drawn), resample (the sample size of the resample), iter (number of iterations, i.e. number of times the whole procedure is repeated. The higher the number, the more precise the estimate), mail (a mail address from which to which you want to send your result), mail_subject (the subject of the mail). The start of your function should look like this (we’ve also included the code for the mail function to make it a little easier):

## sim_false_positive functoin

sim_false_positive <- function(n = 30,
                               m = 5,
                               sd = 2,
                               resample = 10,
                               iter = 1000,
                               mail = NULL,
                               mail_subject = "Your results"){
  
  # Here goes your code for the simulation. It should return an object called
  # false_positive, containing the proportion of significant results (i.e. with
  # a p-value smaller or equal to .05)
  
  
  # Here is the part that sends the mail
  if (!is.null(mail)){
    
    # mail addresses and subject
    from <- mail
    to <- mail
    subject <- mail_subject
    
    # The body of the mail. Costumize it to your liking
    body <- list(paste("Here are your results:", paste0("Resample was ",
                                                        resample, "."),
                       paste0("The initial samplesize was ", n,
                              ", and you've drawn ", iter, " samples."),
                       paste0("The proportion of false positives was",
                              false_positive, ".")))
    
    # Function that acutally sends the mail. Note that not all mail addresses
    # will work.
    sendmail(from, to, subject, body,
                        control=list(smtpServer="ASPMX.L.GOOGLE.COM"))
  }
  
  # return proportion of false positives
  false_positive
  
}

Now program the middle part to do the sampling and iteration.

sim_false_positive <- function(n = 30,
                               m = 5,
                               sd = 2,
                               resample = 10,
                               iter = 1000,
                               mail = NULL,
                               mail_subject = "Your results"){
  
  p_vals <- vector("double", length = iter)
  
  for (i in seq_len(iter)){
    samples <- matrix(rnorm(n * 2, m, sd), ncol = 2)
    
    p_vals[i] <- t.test(samples[,1], samples[,2])$p.value
    
    if (p_vals[i] > .05 && resample > 0){
      
      samples <- rbind(samples, matrix(rnorm(resample * 2, m, sd), ncol = 2))
      
      p_vals[i] <- t.test(samples[,1], samples[,2])$p.value
      
    }
    
  }
  
  false_positive <- mean(p_vals <= .05)
  
  if (!is.null(mail)){
    
    from <- mail
    to <- mail
    subject <- mail_subject

    body <- list(paste("Here are your results:", paste0("Resample was ",
                                                        resample, "."),
                       paste0("The initial samplesize was ", n,
                              ", and you've drawn ", iter, " samples."),
                       paste0("The proportion of false positives was",
                              false_positive, ".")))

    sendmail(from, to, subject, body,
                        control=list(smtpServer="ASPMX.L.GOOGLE.COM"))
  }
  
  false_positive
  
}

Now that you’ve created your function, use one of the map functions to execute the function several times with different sample sizes and resample sizes. Note that if you only change resample, you will have to create an anonymous function, because otherwise the resample size will be passed to the first argument of sim_false_positive, which is n.

resample <- list(10, 0)

map_dbl(resample, ~sim_false_positive(resample = ., mail = NULL))

[1] 0.074 0.052

Additional reading

For more details on programming in R, check out Nathaniel’s YaRrr! The Pirate’s Guide to R YaRrr!
Hadley Wickham, covers programming in R and purrr in his book R for data science: R for Data Science link

Programming in R - Practical Answers

BaselRBootcamp April 2018

Slides

Overview

Cheatsheets

Tasks

Getting setup

Conditional Statements

Functions

Iteration

`apply` and `map` functions

Bringing it all together - Challenge

Additional reading

Programming in R - Practical Answers

BaselRBootcamp April 2018

Slides

Overview

Cheatsheets

Tasks

Getting setup

Conditional Statements

Functions

Iteration

apply and map functions

Bringing it all together - Challenge

Additional reading

`apply` and `map` functions