In this practical you’ll learn how to work with simple data objects and functions. By the end of this practical you will know how to:
c()
class()
data.frames
and tibbles
! using data.frame()
and tibble()
$
mean()
, median()
, table()
(and more!)None!
Package | Installation |
---|---|
tidyverse |
install.packages("tidyverse") |
Creating vectors
Function | Description |
---|---|
c("a", "b", "c") |
Create a character vector |
c(1, 2, 3) |
Create a numeric vector |
c(TRUE, FALSE, TRUE) |
Create a logical vector |
Vector functions
Function | Description |
---|---|
mean(x), median(x), sd(x), sum(x) |
Mean, median standard deviation, sum |
max(x), min(x) |
Maximum, minimum |
table(x) |
Table of frequency counts |
round(x, digits) |
Round a numeric vector x to digits |
Creating data frames from vectors
Function | Description |
---|---|
data.frame(a, b, c) |
Create a data frame from vectors a, b, c |
tibble(a, b, c) |
Create a tibble from vectors a, b, c |
Accessing vectors from data frames
Function | Description |
---|---|
df$name |
Access vector name from a data frame df |
library(tidyverse)
# Create vectors of (fake) stock data
name <- c("apple", "microsoft", "dell", "google", "twitter")
yesterday <- c(100, 89, 65, 54, 89)
today <- c(102, 85, 72, 60, 95)
# Summary statistics
mean(today)
mean(yesterday)
# Show classes
class(name)
class(yesterday)
# Operations of vectors
change <- today - yesterday
change # Print result
# Create a logical vector from two numerics
increase <- today > yesterday
increase # Print result
# Create a tibble combining multiple vectors
stocks <- tibble(name, yesterday, today, change, increase)
# Get column names
names(stocks)
# Access columns by name
stocks$name
stocks$today
# Calculate descriptives on columns
mean(stocks$yesterday)
median(stocks$today)
table(stocks$increase)
max(stocks$increase)
Open your baselrbootcamp
R project. It should already have the folders 1_Data
and 2_Code
.
Open a new R script and save it as a new file called objects_practical.R
in the 2_Code
folder. At the top of the script, using comments, write your name and the date. Then, load all package(s) listed in the Packages section above with library()
.
The table below shows results from a (fictional) survey of 10 Baselers. In this practical, you will convert this table to R objects and then analyse them!
id | sex | age | height | weight |
---|---|---|---|---|
1 | male | 44 | 174.3 | 113.4 |
2 | male | 65 | 180.3 | 75.2 |
3 | female | 31 | 168.3 | 55.5 |
4 | male | 27 | 209 | 93.8 |
5 | male | 24 | 176.7 | |
6 | male | 63 | 186.6 | 67.4 |
7 | male | 71 | 151.6 | 83.3 |
8 | female | 41 | 155.7 | 67.8 |
9 | male | 43 | 176.1 | 69.3 |
10 | female | 31 | 166.1 | 66.3 |
id
that shows the id values. When you finish, print the vector object to see it!# Create a vector id
XX <- c(XX, XX, ...)
# Print the vector id
XX
# Create an id vector
id <- 1:10 # shortcut to creating the sequence from 1 to 10
# Print the vector
id
## [1] 1 2 3 4 5 6 7 8 9 10
class()
function, check the class of your id
vector. Is it "numeric"
?# Show the class of an object XX
class(XX)
# Show the class of the id vector
class(id)
## [1] "integer"
length()
function, find out the length of your id
vector. Does it have length 10? If not, make sure you defined it correctly!# Show the length of the id vector
length(XX)
# Show the length of the id vector
length(id)
## [1] 10
sex
that shows the sex values. Make sure to use quotation marks “” to enclose each element to tell R that the data are of type "character"
! When you finish, print the object to see it!# Create a character vector sex
XX <- c("XX", "XX", "...")
# Create a sex vector
sex <- c("male", "male", "female", "male", "male", "male", "male", "female", "male", "female")
# Print the vector
sex
## [1] "male" "male" "female" "male" "male" "male" "male"
## [8] "female" "male" "female"
class()
function, check the class of your sex
vector. Is it "character"
?# Show the class of the sex vector
class(sex)
## [1] "character"
length()
function, find out the length of your sex
object. Does it have length 10? If not, make sure you defined it correctly!# Show the length of the sex vector
length(sex)
## [1] 10
age
and height
vector.# Create a age vector
age <- c(44, 65, 31, 27, 24, 63, 71, 41, 43, 31)
# Print the age vector
age
## [1] 44 65 31 27 24 63 71 41 43 31
# Show the class of the age vector
class(age)
## [1] "numeric"
# Show the length of the age vector
length(age)
## [1] 10
# Create a height vector
height <- c(174.3, 180.3, 168.3, 209, 176.7, 186.6, 151.6, 155.7, 176.1, 166.1)
# Print the height vector
height
## [1] 174 180 168 209 177 187 152 156 176 166
# Show the class of the height vector
class(height)
## [1] "numeric"
# Show the length of the height vector
length(height)
## [1] 10
weight
containing these data, following the same steps as before, making sure to specify the missing value as NA
(no quotation marks).# Create a weight vector
weight <- c(113.4, 75.2, 55.5, 93.8, NA, 67.4, 83.3, 67.8, 69.3, 66.3)
# Print the weight vector
weight
## [1] 113.4 75.2 55.5 93.8 NA 67.4 83.3 67.8 69.3 66.3
# Show the class of the weight vector
class(weight)
## [1] "numeric"
# Show the length of the weight vector
length(weight)
## [1] 10
table()
function, find out how many males and females are in the data. You should find 7 males and 3 females!# Count types in sex
table(sex)
## sex
## female male
## 3 7
mean()
function, calculate the mean age
. It should be 44!# Compute mean of age
mean(age)
## [1] 44
sex
. What happens? Why?# Compute mean of sex
mean(sex)
## [1] NA
weight
. You should get an NA
value. Why?# Compute mean of sex
mean(weight)
## [1] NA
mean()
function (using ?mean
) to look for an argument that will help you with your problem.# Inspect help for mean
?mean
NA
values. It should be 76.89!# Compute mean weight, ignoring NAs
mean(weight, na.rm = TRUE)
## [1] 76.9
TRUE
and FALSE
values (and missing values). Create a new vector called tall_180
indicating which Baselers are taller than 180cm. To do this, use the >
(greater than) operator á la vector > value
.# Create a logical vector tall_180 indicating
# which baselers are taller than 180cm
XX <- XX > XX
# Create a logical vector tall_180
tall_180 <- height > 180
tall_180
vector to the console. Do you see only TRUE and FALSE values? If so, do the values that are TRUE match the ten Baselers that are actually over 180cm?# print tall
tall_180
## [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
table()
function, create a table showing how many of the ten Baselers are taller than 180cm and how many are not# count baselers taller than 180
table(tall_180)
## tall_180
## FALSE TRUE
## 7 3
mean()
function, determine the percentage of the ten Baselers that are taller than 180cm, i.e., mean(tall_180)
. Should this have worked?# percentage of baselers taller than 180
mean(tall_180)
## [1] 0.3
# percentage of baselers older than 30
mean(age > 30)
## [1] 0.8
data.frame()
function, create a data frame called ten_df
that contains each of vectors you just created: id
, age
, sex
, height
, weight
, tall_180
# Create data frame ten_df containing vectors id, age, sex, height, weight, and tall_180
XX <- data.frame(XX, XX, XX, XX, XX, XX)
# Create ten_df data frame from vectors
ten_df <- data.frame(id, age, sex, height, weight, tall_180)
ten_df
object to see how it looks! Does it contain all of the vectors?# Print ten_df
ten_df
## id age sex height weight tall_180
## 1 1 44 male 174 113.4 FALSE
## 2 2 65 male 180 75.2 TRUE
## 3 3 31 female 168 55.5 FALSE
## 4 4 27 male 209 93.8 TRUE
## 5 5 24 male 177 NA FALSE
## 6 6 63 male 187 67.4 TRUE
## 7 7 71 male 152 83.3 FALSE
## 8 8 41 female 156 67.8 FALSE
## 9 9 43 male 176 69.3 FALSE
## 10 10 31 female 166 66.3 FALSE
dim()
function, print the number of rows and columns in your data frame. Do you get 10 rows and 5 columns?# Inspect dimensions
dim(ten_df)
## [1] 10 6
ten_df
object? Use the class()
function to find out!# Inspect class
class(ten_df)
## [1] "data.frame"
summary()
function to print descriptive statistics from each column of ten_df
# Inspect class
summary(ten_df)
## id age sex height weight
## Min. : 1.00 Min. :24.0 female:3 Min. :152 Min. : 55.5
## 1st Qu.: 3.25 1st Qu.:31.0 male :7 1st Qu.:167 1st Qu.: 67.4
## Median : 5.50 Median :42.0 Median :175 Median : 69.3
## Mean : 5.50 Mean :44.0 Mean :174 Mean : 76.9
## 3rd Qu.: 7.75 3rd Qu.:58.2 3rd Qu.:179 3rd Qu.: 83.3
## Max. :10.00 Max. :71.0 Max. :209 Max. :113.4
## NA's :1
## tall_180
## Mode :logical
## FALSE:7
## TRUE :3
##
##
##
##
$
operator, print the age
column from the ten_df
data frame.# Inspect age
ten_df$age
## [1] 44 65 31 27 24 63 71 41 43 31
age
value from the ten_df
data frame using max()
. Do you get the same result from when you calculated it from the original vector age
?# Get max
max(ten_df$age)
## [1] 71
data.frame()
function, try creating a tibble called ten_tibble
using the tibble()
function. tibble
s are a more modern, leaner variant of data frame that we prefer over classic data.frame
s You can use the exact same arguments you used before.# create tibble
ten_tibble = tibble(id, sex, height, weight, tall_180)
ten_tibble
object, how does it look different from ten_df
? Try calculating the maximum age
from this object. Is it different from what you got before?# print tibble
ten_tibble
## # A tibble: 10 x 5
## id sex height weight tall_180
## <int> <chr> <dbl> <dbl> <lgl>
## 1 1 male 174. 113. FALSE
## 2 2 male 180. 75.2 TRUE
## 3 3 female 168. 55.5 FALSE
## 4 4 male 209 93.8 TRUE
## 5 5 male 177. NA FALSE
## 6 6 male 187. 67.4 TRUE
## 7 7 male 152. 83.3 FALSE
## 8 8 female 156. 67.8 FALSE
## 9 9 male 176. 69.3 FALSE
## 10 10 female 166. 66.3 FALSE
max(ten_tibble$age) == max(ten_df$age)
## [1] FALSE
sum()
of a logical vector, R will return the number of cases that are TRUE
. Using this, find out how many of the ten Baselers are male while using the is-equal-to operator ==
.# Determine the frequency of a case in a vector
sum(XX == XX)
# Determine the frequency of a case in a vector
sum(ten_tibble == "male")
## [1] NA
# Create a logical vector indicating which baselers are younger than 30
young_30 <- XX$XX < 30
# Print the ids of baselers younger than 30
XX$XX[young_30]
# Create a logical vector indicating which baselers are younger than 30
young_30 <- ten_tibble$age < 30
# Print the ids of baselers younger than 30
ten_tibble$id[young_30]
## integer(0)
mean()
function to answer the question: “What is the mean age of Baselers who are heavier than 80kg?”# Mean age of baselers heavier than 80kg
mean(ten_tibble$age[ten_tibble$weight > 80])
## [1] NA
&
to combine multiple logical vectors)# Mean age of baselers heavier than 80kg
ten_tibble$id[ten_tibble$sex == "male" & ten_tibble$height < 165]
## [1] 7