www.therbootcamp.com

@therbootcamp

In this practical you’ll learn how to work with simple data objects and functions. By the end of this practical you will know how to:

- Create vectors of different types using
`c()`

- Understand the three main vector classes numeric, character, and logical using
`class()`

- Create
`data.frames`

and`tibbles`

! using`data.frame()`

and`tibble()`

- Access vectors from data frames using
`$`

- Calculate basic descriptive statistics using
`mean()`

,`median()`

,`table()`

(and more!)

*None!*

Package | Installation |
---|---|

`tidyverse` |
`install.packages("tidyverse")` |

*Creating vectors*

Function | Description |
---|---|

`c("a", "b", "c")` |
Create a character vector |

`c(1, 2, 3)` |
Create a numeric vector |

`c(TRUE, FALSE, TRUE)` |
Create a logical vector |

*Vector functions*

Function | Description |
---|---|

`mean(x), median(x), sd(x), sum(x)` |
Mean, median standard deviation, sum |

`max(x), min(x)` |
Maximum, minimum |

`table(x)` |
Table of frequency counts |

`round(x, digits)` |
Round a numeric vector x to `digits` |

*Creating data frames from vectors*

Function | Description |
---|---|

`data.frame(a, b, c)` |
Create a data frame from vectors a, b, c |

`tibble(a, b, c)` |
Create a tibble from vectors a, b, c |

*Accessing vectors from data frames*

Function | Description |
---|---|

`df$name` |
Access vector `name` from a data frame `df` |

```
library(tidyverse)
# Create vectors of (fake) stock data
name <- c("apple", "microsoft", "dell", "google", "twitter")
yesterday <- c(100, 89, 65, 54, 89)
today <- c(102, 85, 72, 60, 95)
# Summary statistics
mean(today)
mean(yesterday)
# Show classes
class(name)
class(yesterday)
# Operations of vectors
change <- today - yesterday
change # Print result
# Create a logical vector from two numerics
increase <- today > yesterday
increase # Print result
# Create a tibble combining multiple vectors
stocks <- tibble(name, yesterday, today, change, increase)
# Get column names
names(stocks)
# Access columns by name
stocks$name
stocks$today
# Calculate descriptives on columns
mean(stocks$yesterday)
median(stocks$today)
table(stocks$increase)
max(stocks$increase)
```

Open your

`baselrbootcamp`

R project. It should already have the folders`1_Data`

and`2_Code`

.Open a new R script and save it as a new file called

`objects_practical.R`

in the`2_Code`

folder. At the top of the script, using comments, write your name and the date. Then, load all package(s) listed in the Packages section above with`library()`

.

The table below shows results from a (fictional) survey of 10 Baselers. In this practical, you will convert this table to R objects and then analyse them!

id | sex | age | height | weight |
---|---|---|---|---|

1 | male | 44 | 174.3 | 113.4 |

2 | male | 65 | 180.3 | 75.2 |

3 | female | 31 | 168.3 | 55.5 |

4 | male | 27 | 209 | 93.8 |

5 | male | 24 | 176.7 | |

6 | male | 63 | 186.6 | 67.4 |

7 | male | 71 | 151.6 | 83.3 |

8 | female | 41 | 155.7 | 67.8 |

9 | male | 43 | 176.1 | 69.3 |

10 | female | 31 | 166.1 | 66.3 |

- Create a numeric vector called
`id`

that shows the id values. When you finish, print the vector object to see it!

```
# Create a vector id
XX <- c(XX, XX, ...)
# Print the vector id
XX
```

```
# Create an id vector
id <- 1:10 # shortcut to creating the sequence from 1 to 10
# Print the vector
id
```

`## [1] 1 2 3 4 5 6 7 8 9 10`

- Using the
`class()`

function, check the class of your`id`

vector. Is it`"numeric"`

?

```
# Show the class of an object XX
class(XX)
```

```
# Show the class of the id vector
class(id)
```

`## [1] "integer"`

- Using the
`length()`

function, find out the length of your`id`

vector. Does it have length 10? If not, make sure you defined it correctly!

```
# Show the length of the id vector
length(XX)
```

```
# Show the length of the id vector
length(id)
```

`## [1] 10`

- Create a character vector called
`sex`

that shows the sex values. Make sure to use quotation marks “” to enclose each element to tell R that the data are of type`"character"`

! When you finish, print the object to see it!

```
# Create a character vector sex
XX <- c("XX", "XX", "...")
```

```
# Create a sex vector
sex <- c("male", "male", "female", "male", "male", "male", "male", "female", "male", "female")
# Print the vector
sex
```

```
## [1] "male" "male" "female" "male" "male" "male" "male"
## [8] "female" "male" "female"
```

- Using the
`class()`

function, check the class of your`sex`

vector. Is it`"character"`

?

```
# Show the class of the sex vector
class(sex)
```

`## [1] "character"`

- Using the
`length()`

function, find out the length of your`sex`

object. Does it have length 10? If not, make sure you defined it correctly!

```
# Show the length of the sex vector
length(sex)
```

`## [1] 10`

- Using the same steps as before, create a
`age`

and`height`

vector.

```
# Create a age vector
age <- c(44, 65, 31, 27, 24, 63, 71, 41, 43, 31)
# Print the age vector
age
```

`## [1] 44 65 31 27 24 63 71 41 43 31`

```
# Show the class of the age vector
class(age)
```

`## [1] "numeric"`

```
# Show the length of the age vector
length(age)
```

`## [1] 10`

```
# Create a height vector
height <- c(174.3, 180.3, 168.3, 209, 176.7, 186.6, 151.6, 155.7, 176.1, 166.1)
# Print the height vector
height
```

`## [1] 174 180 168 209 177 187 152 156 176 166`

```
# Show the class of the height vector
class(height)
```

`## [1] "numeric"`

```
# Show the length of the height vector
length(height)
```

`## [1] 10`

- Look at the weight data, you’ll notice it contains an missing value. Create a vector called
`weight`

containing these data, following the same steps as before, making sure to specify the missing value as`NA`

(no quotation marks).

```
# Create a weight vector
weight <- c(113.4, 75.2, 55.5, 93.8, NA, 67.4, 83.3, 67.8, 69.3, 66.3)
# Print the weight vector
weight
```

`## [1] 113.4 75.2 55.5 93.8 NA 67.4 83.3 67.8 69.3 66.3`

```
# Show the class of the weight vector
class(weight)
```

`## [1] "numeric"`

```
# Show the length of the weight vector
length(weight)
```

`## [1] 10`

- Using the
`table()`

function, find out how many males and females are in the data. You should find 7 males and 3 females!

```
# Count types in sex
table(sex)
```

```
## sex
## female male
## 3 7
```

- Using the
`mean()`

function, calculate the mean`age`

. It should be 44!

```
# Compute mean of age
mean(age)
```

`## [1] 44`

- Try calculating the mean value of
`sex`

. What happens? Why?

```
# Compute mean of sex
mean(sex)
```

`## [1] NA`

- Try calculating the mean
`weight`

. You should get an`NA`

value. Why?

```
# Compute mean of sex
mean(weight)
```

`## [1] NA`

- Look at the help menu for the
`mean()`

function (using`?mean`

) to look for an argument that will help you with your problem.

```
# Inspect help for mean
?mean
```

- Using the correct argument for the mean function, calculate the mean weight ignoring
`NA`

values. It should be 76.89!

```
# Compute mean weight, ignoring NAs
mean(weight, na.rm = TRUE)
```

`## [1] 76.9`

- Logical vectors contain only
`TRUE`

and`FALSE`

values (and missing values). Create a new vector called`tall_180`

indicating which Baselers are taller than 180cm. To do this, use the`>`

(greater than) operator á la`vector > value`

.

```
# Create a logical vector tall_180 indicating
# which baselers are taller than 180cm
XX <- XX > XX
```

```
# Create a logical vector tall_180
tall_180 <- height > 180
```

- Print your
`tall_180`

vector to the console. Do you see only TRUE and FALSE values? If so, do the values that are TRUE match the ten Baselers that are actually over 180cm?

```
# print tall
tall_180
```

`## [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE`

- Using the
`table()`

function, create a table showing how many of the ten Baselers are taller than 180cm and how many are not

```
# count baselers taller than 180
table(tall_180)
```

```
## tall_180
## FALSE TRUE
## 7 3
```

- Using the
`mean()`

function, determine the percentage of the ten Baselers that are taller than 180cm, i.e.,`mean(tall_180)`

. Should this have worked?

```
# percentage of baselers taller than 180
mean(tall_180)
```

`## [1] 0.3`

- What percent of the ten Baselers were older than 30?

```
# percentage of baselers older than 30
mean(age > 30)
```

`## [1] 0.8`

- Using the
`data.frame()`

function, create a data frame called`ten_df`

that contains each of vectors you just created:`id`

,`age`

,`sex`

,`height`

,`weight`

,`tall_180`

```
# Create data frame ten_df containing vectors id, age, sex, height, weight, and tall_180
XX <- data.frame(XX, XX, XX, XX, XX, XX)
```

```
# Create ten_df data frame from vectors
ten_df <- data.frame(id, age, sex, height, weight, tall_180)
```

- Print your
`ten_df`

object to see how it looks! Does it contain all of the vectors?

```
# Print ten_df
ten_df
```

```
## id age sex height weight tall_180
## 1 1 44 male 174 113.4 FALSE
## 2 2 65 male 180 75.2 TRUE
## 3 3 31 female 168 55.5 FALSE
## 4 4 27 male 209 93.8 TRUE
## 5 5 24 male 177 NA FALSE
## 6 6 63 male 187 67.4 TRUE
## 7 7 71 male 152 83.3 FALSE
## 8 8 41 female 156 67.8 FALSE
## 9 9 43 male 176 69.3 FALSE
## 10 10 31 female 166 66.3 FALSE
```

- Using the
`dim()`

function, print the number of rows and columns in your data frame. Do you get 10 rows and 5 columns?

```
# Inspect dimensions
dim(ten_df)
```

`## [1] 10 6`

- What is the class of your
`ten_df`

object? Use the`class()`

function to find out!

```
# Inspect class
class(ten_df)
```

`## [1] "data.frame"`

- Use the
`summary()`

function to print descriptive statistics from each column of`ten_df`

```
# Inspect class
summary(ten_df)
```

```
## id age sex height weight
## Min. : 1.00 Min. :24.0 female:3 Min. :152 Min. : 55.5
## 1st Qu.: 3.25 1st Qu.:31.0 male :7 1st Qu.:167 1st Qu.: 67.4
## Median : 5.50 Median :42.0 Median :175 Median : 69.3
## Mean : 5.50 Mean :44.0 Mean :174 Mean : 76.9
## 3rd Qu.: 7.75 3rd Qu.:58.2 3rd Qu.:179 3rd Qu.: 83.3
## Max. :10.00 Max. :71.0 Max. :209 Max. :113.4
## NA's :1
## tall_180
## Mode :logical
## FALSE:7
## TRUE :3
##
##
##
##
```

- Using the
`$`

operator, print the`age`

column from the`ten_df`

data frame.

```
# Inspect age
ten_df$age
```

`## [1] 44 65 31 27 24 63 71 41 43 31`

- Calculate the maximum
`age`

value from the`ten_df`

data frame using`max()`

. Do you get the same result from when you calculated it from the original vector`age`

?

```
# Get max
max(ten_df$age)
```

`## [1] 71`

- Instead of creating a data frame of the data using the
`data.frame()`

function, try creating a tibble called`ten_tibble`

using the`tibble()`

function.`tibble`

s are a more modern, leaner variant of data frame that we prefer over classic`data.frame`

s You can use the exact same arguments you used before.

```
# create tibble
ten_tibble = tibble(id, sex, height, weight, tall_180)
```

- Print your new
`ten_tibble`

object, how does it look different from`ten_df`

? Try calculating the maximum`age`

from this object. Is it different from what you got before?

```
# print tibble
ten_tibble
```

```
## # A tibble: 10 x 5
## id sex height weight tall_180
## <int> <chr> <dbl> <dbl> <lgl>
## 1 1 male 174. 113. FALSE
## 2 2 male 180. 75.2 TRUE
## 3 3 female 168. 55.5 FALSE
## 4 4 male 209 93.8 TRUE
## 5 5 male 177. NA FALSE
## 6 6 male 187. 67.4 TRUE
## 7 7 male 152. 83.3 FALSE
## 8 8 female 156. 67.8 FALSE
## 9 9 male 176. 69.3 FALSE
## 10 10 female 166. 66.3 FALSE
```

`max(ten_tibble$age) == max(ten_df$age)`

`## [1] FALSE`

- If you take the
`sum()`

of a logical vector, R will return the number of cases that are`TRUE`

. Using this, find out how many of the ten Baselers are male while using the is-equal-to operator`==`

.

```
# Determine the frequency of a case in a vector
sum(XX == XX)
```

```
# Determine the frequency of a case in a vector
sum(ten_tibble == "male")
```

`## [1] NA`

- You can use logical vectors to select rows from a data frame based on certain criteria. using the following template, get the id values of Baselers who are younger than 30:

```
# Create a logical vector indicating which baselers are younger than 30
young_30 <- XX$XX < 30
# Print the ids of baselers younger than 30
XX$XX[young_30]
```

```
# Create a logical vector indicating which baselers are younger than 30
young_30 <- ten_tibble$age < 30
# Print the ids of baselers younger than 30
ten_tibble$id[young_30]
```

`## integer(0)`

- Use a combination of logical vectors and the
`mean()`

function to answer the question: “What is the mean age of Baselers who are heavier than 80kg?”

```
# Mean age of baselers heavier than 80kg
mean(ten_tibble$age[ten_tibble$weight > 80])
```

`## [1] NA`

- What are the id values of Baselers who are male
*and*are shorter than 165cm? (Hint: You will need to use the logical AND operator`&`

to combine multiple logical vectors)

```
# Mean age of baselers heavier than 80kg
ten_tibble$id[ten_tibble$sex == "male" & ten_tibble$height < 165]
```

`## [1] 7`

- For more information on the fundamentals of object and functions in R see the R Core team’s introduction to R and for even more advanced object and function-related topics Hadley Wickham’s Advanced R.