In this practical you’ll learn how to work with simple data objects and functions. By the end of this practical you will know how to:
c()class()data.frames and tibbles! using data.frame() and tibble()$mean(), median(), table() (and more!)None!
| Package | Installation |
|---|---|
tidyverse |
install.packages("tidyverse") |
Creating vectors
| Function | Description |
|---|---|
c("a", "b", "c") |
Create a character vector |
c(1, 2, 3) |
Create a numeric vector |
c(TRUE, FALSE, TRUE) |
Create a logical vector |
Vector functions
| Function | Description |
|---|---|
mean(x), median(x), sd(x), sum(x) |
Mean, median standard deviation, sum |
max(x), min(x) |
Maximum, minimum |
table(x) |
Table of frequency counts |
round(x, digits) |
Round a numeric vector x to digits |
Creating data frames from vectors
| Function | Description |
|---|---|
data.frame(a, b, c) |
Create a data frame from vectors a, b, c |
tibble(a, b, c) |
Create a tibble from vectors a, b, c |
Accessing vectors from data frames
| Function | Description |
|---|---|
df$name |
Access vector name from a data frame df |
library(tidyverse)
# Create vectors of (fake) stock data
name <- c("apple", "microsoft", "dell", "google", "twitter")
yesterday <- c(100, 89, 65, 54, 89)
today <- c(102, 85, 72, 60, 95)
# Summary statistics
mean(today)
mean(yesterday)
# Show classes
class(name)
class(yesterday)
# Operations of vectors
change <- today - yesterday
change # Print result
# Create a logical vector from two numerics
increase <- today > yesterday
increase # Print result
# Create a tibble combining multiple vectors
stocks <- tibble(name, yesterday, today, change, increase)
# Get column names
names(stocks)
# Access columns by name
stocks$name
stocks$today
# Calculate descriptives on columns
mean(stocks$yesterday)
median(stocks$today)
table(stocks$increase)
max(stocks$increase)
Open your baselrbootcamp R project. It should already have the folders 1_Data and 2_Code.
Open a new R script and save it as a new file called objects_practical.R in the 2_Code folder. At the top of the script, using comments, write your name and the date. Then, load all package(s) listed in the Packages section above with library().
The table below shows results from a (fictional) survey of 10 Baselers. In this practical, you will convert this table to R objects and then analyse them!
| id | sex | age | height | weight |
|---|---|---|---|---|
| 1 | male | 44 | 174.3 | 113.4 |
| 2 | male | 65 | 180.3 | 75.2 |
| 3 | female | 31 | 168.3 | 55.5 |
| 4 | male | 27 | 209 | 93.8 |
| 5 | male | 24 | 176.7 | |
| 6 | male | 63 | 186.6 | 67.4 |
| 7 | male | 71 | 151.6 | 83.3 |
| 8 | female | 41 | 155.7 | 67.8 |
| 9 | male | 43 | 176.1 | 69.3 |
| 10 | female | 31 | 166.1 | 66.3 |
id that shows the id values. When you finish, print the vector object to see it!# Create a vector id
XX <- c(XX, XX, ...)
# Print the vector id
XX
# Create an id vector
id <- 1:10 # shortcut to creating the sequence from 1 to 10
# Print the vector
id
## [1] 1 2 3 4 5 6 7 8 9 10
class() function, check the class of your id vector. Is it "numeric"?# Show the class of an object XX
class(XX)
# Show the class of the id vector
class(id)
## [1] "integer"
length() function, find out the length of your id vector. Does it have length 10? If not, make sure you defined it correctly!# Show the length of the id vector
length(XX)
# Show the length of the id vector
length(id)
## [1] 10
sex that shows the sex values. Make sure to use quotation marks “” to enclose each element to tell R that the data are of type "character"! When you finish, print the object to see it!# Create a character vector sex
XX <- c("XX", "XX", "...")
# Create a sex vector
sex <- c("male", "male", "female", "male", "male", "male", "male", "female", "male", "female")
# Print the vector
sex
## [1] "male" "male" "female" "male" "male" "male" "male"
## [8] "female" "male" "female"
class() function, check the class of your sex vector. Is it "character"?# Show the class of the sex vector
class(sex)
## [1] "character"
length() function, find out the length of your sex object. Does it have length 10? If not, make sure you defined it correctly!# Show the length of the sex vector
length(sex)
## [1] 10
age and height vector.# Create a age vector
age <- c(44, 65, 31, 27, 24, 63, 71, 41, 43, 31)
# Print the age vector
age
## [1] 44 65 31 27 24 63 71 41 43 31
# Show the class of the age vector
class(age)
## [1] "numeric"
# Show the length of the age vector
length(age)
## [1] 10
# Create a height vector
height <- c(174.3, 180.3, 168.3, 209, 176.7, 186.6, 151.6, 155.7, 176.1, 166.1)
# Print the height vector
height
## [1] 174 180 168 209 177 187 152 156 176 166
# Show the class of the height vector
class(height)
## [1] "numeric"
# Show the length of the height vector
length(height)
## [1] 10
weight containing these data, following the same steps as before, making sure to specify the missing value as NA (no quotation marks).# Create a weight vector
weight <- c(113.4, 75.2, 55.5, 93.8, NA, 67.4, 83.3, 67.8, 69.3, 66.3)
# Print the weight vector
weight
## [1] 113.4 75.2 55.5 93.8 NA 67.4 83.3 67.8 69.3 66.3
# Show the class of the weight vector
class(weight)
## [1] "numeric"
# Show the length of the weight vector
length(weight)
## [1] 10
table() function, find out how many males and females are in the data. You should find 7 males and 3 females!# Count types in sex
table(sex)
## sex
## female male
## 3 7
mean() function, calculate the mean age. It should be 44!# Compute mean of age
mean(age)
## [1] 44
sex. What happens? Why?# Compute mean of sex
mean(sex)
## [1] NA
weight. You should get an NA value. Why?# Compute mean of sex
mean(weight)
## [1] NA
mean() function (using ?mean) to look for an argument that will help you with your problem.# Inspect help for mean
?mean
NA values. It should be 76.89!# Compute mean weight, ignoring NAs
mean(weight, na.rm = TRUE)
## [1] 76.9
TRUE and FALSE values (and missing values). Create a new vector called tall_180 indicating which Baselers are taller than 180cm. To do this, use the > (greater than) operator á la vector > value.# Create a logical vector tall_180 indicating
# which baselers are taller than 180cm
XX <- XX > XX
# Create a logical vector tall_180
tall_180 <- height > 180
tall_180 vector to the console. Do you see only TRUE and FALSE values? If so, do the values that are TRUE match the ten Baselers that are actually over 180cm?# print tall
tall_180
## [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
table() function, create a table showing how many of the ten Baselers are taller than 180cm and how many are not# count baselers taller than 180
table(tall_180)
## tall_180
## FALSE TRUE
## 7 3
mean() function, determine the percentage of the ten Baselers that are taller than 180cm, i.e., mean(tall_180). Should this have worked?# percentage of baselers taller than 180
mean(tall_180)
## [1] 0.3
# percentage of baselers older than 30
mean(age > 30)
## [1] 0.8
data.frame() function, create a data frame called ten_df that contains each of vectors you just created: id, age, sex, height, weight, tall_180# Create data frame ten_df containing vectors id, age, sex, height, weight, and tall_180
XX <- data.frame(XX, XX, XX, XX, XX, XX)
# Create ten_df data frame from vectors
ten_df <- data.frame(id, age, sex, height, weight, tall_180)
ten_df object to see how it looks! Does it contain all of the vectors?# Print ten_df
ten_df
## id age sex height weight tall_180
## 1 1 44 male 174 113.4 FALSE
## 2 2 65 male 180 75.2 TRUE
## 3 3 31 female 168 55.5 FALSE
## 4 4 27 male 209 93.8 TRUE
## 5 5 24 male 177 NA FALSE
## 6 6 63 male 187 67.4 TRUE
## 7 7 71 male 152 83.3 FALSE
## 8 8 41 female 156 67.8 FALSE
## 9 9 43 male 176 69.3 FALSE
## 10 10 31 female 166 66.3 FALSE
dim() function, print the number of rows and columns in your data frame. Do you get 10 rows and 5 columns?# Inspect dimensions
dim(ten_df)
## [1] 10 6
ten_df object? Use the class() function to find out!# Inspect class
class(ten_df)
## [1] "data.frame"
summary() function to print descriptive statistics from each column of ten_df# Inspect class
summary(ten_df)
## id age sex height weight
## Min. : 1.00 Min. :24.0 female:3 Min. :152 Min. : 55.5
## 1st Qu.: 3.25 1st Qu.:31.0 male :7 1st Qu.:167 1st Qu.: 67.4
## Median : 5.50 Median :42.0 Median :175 Median : 69.3
## Mean : 5.50 Mean :44.0 Mean :174 Mean : 76.9
## 3rd Qu.: 7.75 3rd Qu.:58.2 3rd Qu.:179 3rd Qu.: 83.3
## Max. :10.00 Max. :71.0 Max. :209 Max. :113.4
## NA's :1
## tall_180
## Mode :logical
## FALSE:7
## TRUE :3
##
##
##
##
$ operator, print the age column from the ten_df data frame.# Inspect age
ten_df$age
## [1] 44 65 31 27 24 63 71 41 43 31
age value from the ten_df data frame using max(). Do you get the same result from when you calculated it from the original vector age?# Get max
max(ten_df$age)
## [1] 71
data.frame() function, try creating a tibble called ten_tibble using the tibble() function. tibbles are a more modern, leaner variant of data frame that we prefer over classic data.frames You can use the exact same arguments you used before.# create tibble
ten_tibble = tibble(id, sex, height, weight, tall_180)
ten_tibble object, how does it look different from ten_df? Try calculating the maximum age from this object. Is it different from what you got before?# print tibble
ten_tibble
## # A tibble: 10 x 5
## id sex height weight tall_180
## <int> <chr> <dbl> <dbl> <lgl>
## 1 1 male 174. 113. FALSE
## 2 2 male 180. 75.2 TRUE
## 3 3 female 168. 55.5 FALSE
## 4 4 male 209 93.8 TRUE
## 5 5 male 177. NA FALSE
## 6 6 male 187. 67.4 TRUE
## 7 7 male 152. 83.3 FALSE
## 8 8 female 156. 67.8 FALSE
## 9 9 male 176. 69.3 FALSE
## 10 10 female 166. 66.3 FALSE
max(ten_tibble$age) == max(ten_df$age)
## [1] FALSE
sum() of a logical vector, R will return the number of cases that are TRUE. Using this, find out how many of the ten Baselers are male while using the is-equal-to operator ==.# Determine the frequency of a case in a vector
sum(XX == XX)
# Determine the frequency of a case in a vector
sum(ten_tibble == "male")
## [1] NA
# Create a logical vector indicating which baselers are younger than 30
young_30 <- XX$XX < 30
# Print the ids of baselers younger than 30
XX$XX[young_30]
# Create a logical vector indicating which baselers are younger than 30
young_30 <- ten_tibble$age < 30
# Print the ids of baselers younger than 30
ten_tibble$id[young_30]
## integer(0)
mean() function to answer the question: “What is the mean age of Baselers who are heavier than 80kg?”# Mean age of baselers heavier than 80kg
mean(ten_tibble$age[ten_tibble$weight > 80])
## [1] NA
& to combine multiple logical vectors)# Mean age of baselers heavier than 80kg
ten_tibble$id[ten_tibble$sex == "male" & ten_tibble$height < 165]
## [1] 7