Slides

Here is a link to the lecture slides for this session: LINK

Overview

In this practical you’ll get started with R. By the end of this practical you will:

  1. Know your way around R Studio.
  2. Know how to run code.
  3. Have an impression of R’s basic functionality.

Tasks

For this practical you will go through an existing analysis script chunk by chunk to experience how programming and analyzing in R works. The idea is that you go through the code, copy the code chunks to the script editor, send the code to the console, and evaluate what happens.

While at it, try to practice two very useful shortcuts: (1) cmd + enter (MAC) or cntrl + enter (Windows) for running the current line in the console. (2) cmd + shift + p (MAC) or cntrl + shift + p (Windows) for running the same block of code again in the console. The latter is really helpful because you can rerun the a chunk of code after you have made changes to it.

Also, try to take a look at the help-files of a functions used in the code below using ?function, e.g., ?mean to inspect the help-file of the mean()-function and try to understand why they work. You may also already try to evaluate the type of the various objects using typeof(object), e.g., typeof(pirates) or typeof(mean), to learn about the different types (of objects) in R.

All of the code used in this tutorial is based on base-R functions. While they are already powerful, we will later in the course introduce you to more modern, better options for most of the steps.

Install and load the yarrr package

  1. First we’ll install and load the yarrr package. The yarrr package contains many data sets and functions.
# Install and load the yarrr package
# 1install.packages('yarrr')
library(yarrr)
Loading required package: jpeg
Loading required package: BayesFactor
Loading required package: coda
Loading required package: Matrix
************
Welcome to BayesFactor 0.9.12-2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).

Type BFManual() to open the manual.
************
Loading required package: circlize
yarrr v0.1.5. Citation info at citation('yarrr'). Package guide at yarrr.guide()
Email me at Nathaniel.D.Phillips.is@gmail.com

Explore the pirates dataset

  1. The pirates data set contains data from a survey of 1,000 pirates. Inspect it one-by-one using the following functions.
# Get help for pirates data - included in package yarrr
?pirates

# Print the first few rows of the dataset
head(pirates)
  id    sex age height weight headband college tattoos tchests parrots
1  1   male  28 173.11   70.5      yes   JSSFP       9       0       0
2  2   male  31 209.25  105.6      yes   JSSFP       9      11       0
3  3   male  26 169.95   77.1      yes    CCCC      10      10       1
4  4 female  31 144.29   58.5       no   JSSFP       2       0       2
5  5 female  41 157.85   58.4      yes   JSSFP       9       6       4
6  6   male  26 190.20   85.4      yes    CCCC       7      19       0
  favorite.pirate sword.type eyepatch sword.time beard.length
1    Jack Sparrow    cutlass        1       0.58           16
2    Jack Sparrow    cutlass        0       1.11           21
3    Jack Sparrow    cutlass        1       1.44           19
4    Jack Sparrow   scimitar        1      36.11            2
5            Hook    cutlass        1       0.11            0
6    Jack Sparrow    cutlass        1       0.59           17
            fav.pixar grogg
1      Monsters, Inc.    11
2              WALL-E     9
3          Inside Out     7
4          Inside Out     9
5          Inside Out    14
6 Monsters University     7
# Show the structure of the dataset
str(pirates)
'data.frame':   1000 obs. of  17 variables:
 $ id             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ sex            : chr  "male" "male" "male" "female" ...
 $ age            : num  28 31 26 31 41 26 31 31 28 30 ...
 $ height         : num  173 209 170 144 158 ...
 $ weight         : num  70.5 105.6 77.1 58.5 58.4 ...
 $ headband       : chr  "yes" "yes" "yes" "no" ...
 $ college        : chr  "JSSFP" "JSSFP" "CCCC" "JSSFP" ...
 $ tattoos        : num  9 9 10 2 9 7 9 5 12 12 ...
 $ tchests        : num  0 11 10 0 6 19 1 13 37 69 ...
 $ parrots        : num  0 0 1 2 4 0 7 7 2 4 ...
 $ favorite.pirate: chr  "Jack Sparrow" "Jack Sparrow" "Jack Sparrow" "Jack Sparrow" ...
 $ sword.type     : chr  "cutlass" "cutlass" "cutlass" "scimitar" ...
 $ eyepatch       : num  1 0 1 1 1 1 0 1 0 1 ...
 $ sword.time     : num  0.58 1.11 1.44 36.11 0.11 ...
 $ beard.length   : num  16 21 19 2 0 17 1 1 1 25 ...
 $ fav.pixar      : chr  "Monsters, Inc." "WALL-E" "Inside Out" "Inside Out" ...
 $ grogg          : num  11 9 7 9 14 7 9 12 16 9 ...
# Show the entire dataset in a new window
View(pirates)
  1. Descriptive statistics for numeric data and categorical data.
# What is the mean age?
mean(pirates$age)
[1] 27.36
# What was the height of the tallest pirate?
max(pirates$height)
[1] 209.25
# How many pirates are there of each sex?
table(pirates$sex)

female   male  other 
   464    490     46 
  1. Descriptive statistics as a function of another categorical variable.
# What was the mean age for each sex?
aggregate(formula = age ~ sex, 
          data = pirates,
          FUN = mean)
     sex      age
1 female 29.92241
2   male 24.96735
3  other 27.00000
# What is the median number of tattoos of pirates for each combination of sex and headband?
aggregate(formula = tattoos ~ sex + headband, 
          data = pirates,
          FUN = median)
     sex headband tattoos
1 female       no       5
2   male       no       4
3  other       no       5
4 female      yes      10
5   male      yes      10
6  other      yes      10

Base plotting

  1. Creating a histograms of numeric variables.
# --- A default histogram of pirate ages
hist(x = pirates$age)

# --- A customized histogram of pirate ages
hist(x = pirates$age,
     main = "Distribution of pirate ages",
     col = "skyblue",
     border = "white",
     xlab = "Age",
     ylim = c(0, 400))

# Add mean label
text(x = mean(pirates$age), y = 375, 
     labels = paste("Mean = ", round(mean(pirates$age), 2)))

# Add dashed line at mean
segments(x0 = mean(pirates$age), y0 = 0, 
         x1 = mean(pirates$age), y1 = 360, 
         col = gray(.2, .2), 
         lty = 2)

# ---- Overlapping histograms of pirate ages for females and males

# Start with the female data
hist(x = pirates$age[pirates$sex == "female"],
     main = "Distribution of pirate ages by sex",
     col = transparent("red", .2),
     border = "white",
     xlab = "Age", 
     breaks = seq(0, 50, 2),
     probability = T,
     ylab = "", 
     yaxt = "n")

# Add male data
hist(x = pirates$age[pirates$sex == "male"],
     add = T, 
     probability = T, 
     border = "white",
     breaks = seq(0, 50, 2),
     col = transparent("skyblue", .5))

# Add the legend
legend(x = 40, 
       y = .05,
       col = c("red", "skyblue"),
       legend = c("Female", "Male"),
       pch = 16,
       bty = "n")

  1. Creating scatterplots of two numerical variables.
# --- A simple scatterplot of pirate height and weight
plot(x = pirates$height,
     y = pirates$weight,
     xlab = "Height (cm)",
     ylab = "Weight (kg)")