Trulli
https://www.baslerfasnacht.info/basel-fasnacht/index.php

Overview

In this practical you’ll learn how to work with simple data objects and functions. By the end of this practical you will know how to:

  1. Create vectors of different types using c()
  2. Understand the three main vector classes numeric, character, and logical using class()
  3. Create data.frames and tibbles! using data.frame() and tibble()
  4. Access vectors from data frames using $
  5. Calculate basic descriptive statistics using mean(), median(), table() (and more!)

Datasets

None!

Packages

Package Installation
tidyverse install.packages("tidyverse")

Glossary

Creating vectors

Function Description
c("a", "b", "c") Create a character vector
c(1, 2, 3) Create a numeric vector
c(TRUE, FALSE, TRUE) Create a logical vector

Vector functions

Function Description
mean(x), median(x), sd(x), sum(x) Mean, median standard deviation, sum
max(x), min(x) Maximum, minimum
table(x) Table of frequency counts
round(x, digits) Round a numeric vector x to digits

Creating data frames from vectors

Function Description
data.frame(a, b, c) Create a data frame from vectors a, b, c
tibble(a, b, c) Create a tibble from vectors a, b, c

Accessing vectors from data frames

Function Description
df$name Access vector name from a data frame df

Examples

library(tidyverse)

# Create vectors of (fake) stock data
name      <- c("apple", "microsoft", "dell", "google", "twitter")
yesterday <- c(100, 89, 65, 54, 89)
today     <- c(102, 85, 72, 60, 95)

# Summary statistics
mean(today)
mean(yesterday)

# Show classes
class(name)
class(yesterday)

# Operations of vectors
change <- today - yesterday
change # Print result

# Create a logical vector from two numerics
increase <- today > yesterday
increase # Print result

# Create a tibble combining multiple vectors
stocks <- tibble(name, yesterday, today, change, increase)

# Get column names
names(stocks)

# Access columns by name
stocks$name
stocks$today

# Calculate descriptives on columns
mean(stocks$yesterday)
median(stocks$today)
table(stocks$increase)
max(stocks$increase)

Tasks

A - Getting setup

  1. Open your baselrbootcamp R project. It should already have the folders 1_Data and 2_Code.

  2. Open a new R script and save it as a new file called objects_practical.R in the 2_Code folder. At the top of the script, using comments, write your name and the date. Then, load all package(s) listed in the Packages section above with library().

B - Creating vectors

The table below shows results from a (fictional) survey of 10 Baselers. In this practical, you will convert this table to R objects and then analyse them!

id sex age height weight
1 male 44 174.3 113.4
2 male 65 180.3 75.2
3 female 31 168.3 55.5
4 male 27 209 93.8
5 male 24 176.7
6 male 63 186.6 67.4
7 male 71 151.6 83.3
8 female 41 155.7 67.8
9 male 43 176.1 69.3
10 female 31 166.1 66.3
  1. Create a numeric vector called id that shows the id values. When you finish, print the vector object to see it!
# Create a vector id
XX <- c(XX, XX, ...)

# Print the vector id
XX
# Create an id vector 
id <- 1:10 # shortcut to creating the sequence from 1 to 10

# Print the vector
id
##  [1]  1  2  3  4  5  6  7  8  9 10
  1. Using the class() function, check the class of your id vector. Is it "numeric"?
# Show the class of an object XX
class(XX)
# Show the class of the id vector
class(id)
## [1] "integer"
  1. Using the length() function, find out the length of your id vector. Does it have length 10? If not, make sure you defined it correctly!
# Show the length of the id vector
length(XX)
# Show the length of the id vector
length(id)
## [1] 10
  1. Create a character vector called sex that shows the sex values. Make sure to use quotation marks “” to enclose each element to tell R that the data are of type "character"! When you finish, print the object to see it!
# Create a character vector sex
XX <- c("XX", "XX", "...")
# Create a sex vector 
sex <- c("male", "male", "female", "male", "male", "male", "male", "female", "male", "female")

# Print the vector
sex
##  [1] "male"   "male"   "female" "male"   "male"   "male"   "male"  
##  [8] "female" "male"   "female"
  1. Using the class() function, check the class of your sex vector. Is it "character"?
# Show the class of the sex vector
class(sex)
## [1] "character"
  1. Using the length() function, find out the length of your sex object. Does it have length 10? If not, make sure you defined it correctly!
# Show the length of the sex vector
length(sex)
## [1] 10
  1. Using the same steps as before, create a age and height vector.
# Create a age vector 
age <- c(44, 65, 31, 27, 24, 63, 71, 41, 43, 31)

# Print the age vector
age
##  [1] 44 65 31 27 24 63 71 41 43 31
# Show the class of the age vector
class(age)
## [1] "numeric"
# Show the length of the age vector
length(age)
## [1] 10
# Create a height vector 
height <- c(174.3, 180.3, 168.3, 209, 176.7, 186.6, 151.6, 155.7, 176.1, 166.1)

# Print the height vector
height
##  [1] 174 180 168 209 177 187 152 156 176 166
# Show the class of the height vector
class(height)
## [1] "numeric"
# Show the length of the height vector
length(height)
## [1] 10
  1. Look at the weight data, you’ll notice it contains an missing value. Create a vector called weight containing these data, following the same steps as before, making sure to specify the missing value as NA (no quotation marks).
# Create a weight vector 
weight <- c(113.4, 75.2, 55.5, 93.8, NA, 67.4, 83.3, 67.8, 69.3, 66.3)

# Print the weight vector
weight
##  [1] 113.4  75.2  55.5  93.8    NA  67.4  83.3  67.8  69.3  66.3
# Show the class of the weight vector
class(weight)
## [1] "numeric"
# Show the length of the weight vector
length(weight)
## [1] 10

C - Functions

  1. Using the table() function, find out how many males and females are in the data. You should find 7 males and 3 females!
# Count types in sex
table(sex)
## sex
## female   male 
##      3      7
  1. Using the mean() function, calculate the mean age. It should be 44!
# Compute mean of age
mean(age)
## [1] 44
  1. Try calculating the mean value of sex. What happens? Why?
# Compute mean of sex
mean(sex)
## [1] NA
  1. Try calculating the mean weight. You should get an NA value. Why?
# Compute mean of sex
mean(weight)
## [1] NA
  1. Look at the help menu for the mean() function (using ?mean) to look for an argument that will help you with your problem.
# Inspect help for mean
?mean
  1. Using the correct argument for the mean function, calculate the mean weight ignoring NA values. It should be 76.89!
# Compute mean weight, ignoring NAs 
mean(weight, na.rm = TRUE)
## [1] 76.9

D - Logical Vectors

  1. Logical vectors contain only TRUE and FALSE values (and missing values). Create a new vector called tall_180 indicating which Baselers are taller than 180cm. To do this, use the > (greater than) operator á la vector > value.
# Create a logical vector tall_180 indicating
#  which baselers are taller than 180cm

XX <- XX > XX
# Create a logical vector tall_180
tall_180 <- height > 180
  1. Print your tall_180 vector to the console. Do you see only TRUE and FALSE values? If so, do the values that are TRUE match the ten Baselers that are actually over 180cm?
# print tall
tall_180
##  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE
  1. Using the table() function, create a table showing how many of the ten Baselers are taller than 180cm and how many are not
# count baselers taller than 180
table(tall_180)
## tall_180
## FALSE  TRUE 
##     7     3
  1. Using the mean() function, determine the percentage of the ten Baselers that are taller than 180cm, i.e., mean(tall_180). Should this have worked?
# percentage of baselers taller than 180
mean(tall_180)
## [1] 0.3
  1. What percent of the ten Baselers were older than 30?
# percentage of baselers older than 30
mean(age > 30)
## [1] 0.8

E - Creating data frames

  1. Using the data.frame() function, create a data frame called ten_df that contains each of vectors you just created: id, age, sex, height, weight, tall_180
# Create data frame ten_df containing vectors id, age, sex, height, weight, and tall_180
XX <- data.frame(XX, XX, XX, XX, XX, XX) 
# Create ten_df data frame from vectors
ten_df <- data.frame(id, age, sex, height, weight, tall_180) 
  1. Print your ten_df object to see how it looks! Does it contain all of the vectors?
# Print ten_df
ten_df
##    id age    sex height weight tall_180
## 1   1  44   male    174  113.4    FALSE
## 2   2  65   male    180   75.2     TRUE
## 3   3  31 female    168   55.5    FALSE
## 4   4  27   male    209   93.8     TRUE
## 5   5  24   male    177     NA    FALSE
## 6   6  63   male    187   67.4     TRUE
## 7   7  71   male    152   83.3    FALSE
## 8   8  41 female    156   67.8    FALSE
## 9   9  43   male    176   69.3    FALSE
## 10 10  31 female    166   66.3    FALSE
  1. Using the dim() function, print the number of rows and columns in your data frame. Do you get 10 rows and 5 columns?
# Inspect dimensions
dim(ten_df)
## [1] 10  6
  1. What is the class of your ten_df object? Use the class() function to find out!
# Inspect class
class(ten_df)
## [1] "data.frame"
  1. Use the summary() function to print descriptive statistics from each column of ten_df
# Inspect class
summary(ten_df)
##        id             age           sex        height        weight     
##  Min.   : 1.00   Min.   :24.0   female:3   Min.   :152   Min.   : 55.5  
##  1st Qu.: 3.25   1st Qu.:31.0   male  :7   1st Qu.:167   1st Qu.: 67.4  
##  Median : 5.50   Median :42.0              Median :175   Median : 69.3  
##  Mean   : 5.50   Mean   :44.0              Mean   :174   Mean   : 76.9  
##  3rd Qu.: 7.75   3rd Qu.:58.2              3rd Qu.:179   3rd Qu.: 83.3  
##  Max.   :10.00   Max.   :71.0              Max.   :209   Max.   :113.4  
##                                                          NA's   :1      
##   tall_180      
##  Mode :logical  
##  FALSE:7        
##  TRUE :3        
##                 
##                 
##                 
## 
  1. Using the $ operator, print the age column from the ten_df data frame.
# Inspect age
ten_df$age
##  [1] 44 65 31 27 24 63 71 41 43 31
  1. Calculate the maximum age value from the ten_df data frame using max(). Do you get the same result from when you calculated it from the original vector age?
# Get max
max(ten_df$age)
## [1] 71
  1. Instead of creating a data frame of the data using the data.frame() function, try creating a tibble called ten_tibble using the tibble() function. tibbles are a more modern, leaner variant of data frame that we prefer over classic data.frames You can use the exact same arguments you used before.
# create tibble
ten_tibble = tibble(id, sex, height, weight, tall_180)
  1. Print your new ten_tibble object, how does it look different from ten_df? Try calculating the maximum age from this object. Is it different from what you got before?
# print tibble
ten_tibble
## # A tibble: 10 x 5
##       id sex    height weight tall_180
##    <int> <chr>   <dbl>  <dbl> <lgl>   
##  1     1 male     174.  113.  FALSE   
##  2     2 male     180.   75.2 TRUE    
##  3     3 female   168.   55.5 FALSE   
##  4     4 male     209    93.8 TRUE    
##  5     5 male     177.   NA   FALSE   
##  6     6 male     187.   67.4 TRUE    
##  7     7 male     152.   83.3 FALSE   
##  8     8 female   156.   67.8 FALSE   
##  9     9 male     176.   69.3 FALSE   
## 10    10 female   166.   66.3 FALSE
max(ten_tibble$age) == max(ten_df$age)
## [1] FALSE

X - Challenges

  1. If you take the sum() of a logical vector, R will return the number of cases that are TRUE. Using this, find out how many of the ten Baselers are male while using the is-equal-to operator ==.
# Determine the frequency of a case in a vector
sum(XX == XX)
# Determine the frequency of a case in a vector
sum(ten_tibble == "male")
## [1] NA
  1. You can use logical vectors to select rows from a data frame based on certain criteria. using the following template, get the id values of Baselers who are younger than 30:
# Create a logical vector indicating which baselers are younger than 30
young_30 <- XX$XX < 30

# Print the ids of baselers younger than 30
XX$XX[young_30]
# Create a logical vector indicating which baselers are younger than 30
young_30 <- ten_tibble$age < 30

# Print the ids of baselers younger than 30
ten_tibble$id[young_30]
## integer(0)
  1. Use a combination of logical vectors and the mean() function to answer the question: “What is the mean age of Baselers who are heavier than 80kg?”
# Mean age of baselers heavier than 80kg
mean(ten_tibble$age[ten_tibble$weight > 80])
## [1] NA
  1. What are the id values of Baselers who are male and are shorter than 165cm? (Hint: You will need to use the logical AND operator & to combine multiple logical vectors)
# Mean age of baselers heavier than 80kg
ten_tibble$id[ten_tibble$sex == "male" & ten_tibble$height < 165]
## [1] 7

Additional Resources

  • For more information on the fundamentals of object and functions in R see the R Core team’s introduction to R and for even more advanced object and function-related topics Hadley Wickham’s Advanced R.