+ - 0:00:00
Notes for current slide
Notes for next slide

Data objects and functions

The R Bootcamp
Twitter: @therbootcamp

April 2018

1 / 15

Objects

"Everything in R is an object"

John Chambers



  • R's objects are have content and attributes.
  • The content can be anything from numbers or strings to functions or complex data structures.
  • Attributes can be names, dimensions, and the class or type of the object, but other attributes are possible.
  • Practically all data objects are equipped with those three essential attributes.



2 / 15

Vectors

R's most basic (and most simple) data format - even single values (aka scalars) are implemented as vectors.

# creating a vector (incl. names)
my_vec <- c(t_1 = 1.343, t_2 = 5.232)
# naming vectors
my_vec <- c(t_1 = 1.343, t_2 = 5.232)
names(my_vec) <- c("new_1","new_2")
# evaluting inherent attributes
names(my_vec)
length(my_vec)
typeof(my_vec)


3 / 15

Types

Vectors contains elements of only one type. Most often one of the four basic types: integer, double, numeric, and character. You can test the type using typeof() or the type-specific is.*(), e.g., is.integer().

# numeric vectors
my_vec <- c(1.343, 5.232)
typeof(my_vec)
## [1] "double"
# integer vectors (L avoids coercion)
my_vec <- c(1L, 7L, 2L)
typeof(my_vec)
## [1] "integer"
# logical vectors
my_vec <- c(TRUE, FALSE)
typeof(my_vec)
## [1] "logical"
# character vectors
my_vec <- c('a', 'hello', 'world')
typeof(my_vec)
## [1] "character"
4 / 15

Coercion

R allows you to flexibly change types into another using as.*(), e.g., as.numeric or as.logical, and often R does this for you. For instance, mathematical operations & functions will coerce logical to double or integer and logical operations (&, |, any, etc) will coerce to a logical. Importantly, coercion may introduce information loss!

# everything becomes character
my_vec <- c(1L, 1.23, 'a', TRUE)
my_vec
## [1] "1" "1.23" "a" "TRUE"
# logicals become 0s and 1s
TRUE + FALSE + TRUE
## [1] 2

```

# logical operation -> logical type
c(1, 7, 2) > 3
## [1] FALSE TRUE FALSE
# R can parse character
as.numeric(c("1", "2", "TRUE"))
## Warning: NAs introduced by coercion
## [1] 1 2 NA
5 / 15

lists

  • Lists are R's swiss army knife. They often are used for outputs of statistical functions e.g., lm().

  • Lists have non-flat structures that take any object type, including lists, rendering lists recursive.

  • Lists can be understood as a meta-vector that includes an organizational layer.

  • To create a list use list() or as.list()

6 / 15

data_frames

  • Data frames (and its variants, e.g., tibbles) are R's main data format.

  • Data frames are lists with specific requirements:

    • Every element must be a vector.

    • The lengths of the vectors must be equal (or multiples of another).

  • Use data_frame() and as_data_frame to create or to coerce to data frame, to tibble to be exact.

7 / 15

Inspecting data_frames

Data frames (or tibbles) can be inspecting in various ways.

  • print() - shows the default print (good with tibbles, bad with everything else)
  • head(),tail() - prints the first/last six rows
  • str() - gives an overview of the variables
  • View() - opens Excel-like window

## # A tibble: 1,000 x 17
## id sex age height weight headband college tattoos tchests parrots
## * <int> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1 male 28. 173. 70.5 yes JSSFP 9. 0. 0.
## 2 2 male 31. 209. 106. yes JSSFP 9. 11. 0.
## 3 3 male 26. 170. 77.1 yes CCCC 10. 10. 1.
## 4 4 female 31. 144. 58.5 no JSSFP 2. 0. 2.
## 5 5 female 41. 158. 58.4 yes JSSFP 9. 6. 4.
## 6 6 male 26. 190. 85.4 yes CCCC 7. 19. 0.
## 7 7 female 31. 158. 59.6 yes JSSFP 9. 1. 7.
## 8 8 female 31. 173. 74.5 yes JSSFP 5. 13. 7.
## # ... with 992 more rows, and 7 more variables: favorite.pirate <chr>,
## # sword.type <chr>, eyepatch <dbl>, sword.time <dbl>, beard.length <dbl>,
## # fav.pixar <chr>, grogg <dbl>
8 / 15

Accessing & changing vectors

To access (aka subset or slicing) and change atomic data objects use brackets [] and provide either integers, logicals, or names to indicate the relevant vector content. To change content, assign new content of matching size to subset using ´<-´.

# retrieve second element from vector
my_vec <- c('A', 'B', 'C')
my_vec[2]
## [1] "B"
# change the second element
my_vec[2] <- 'D'
my_vec
## [1] "A" "D" "C"
# Use logical comparison to access vector
my_vec[my_vec != 'A']
## [1] "D" "C"
# Change vector using logical comparison
my_vec[my_vec != 'A'] <- c('E', 'F')
my_vec
## [1] "A" "E" "F"
9 / 15

Accessing & changing data frames (and lists)

Data frames (and lists) are best accessed using names and the $-operator. This, of course, implies that you followed good practice and named the individual elements in the data object.

# define data frame
my_df <- data_frame('v_1' = c('A', 'B'), 'v_2' = c(1, 2))
# One bad, two correct ways to subset
my_df[1] ; my_df[[1]] ; my_df[['v_1']]
## # A tibble: 2 x 1
## v_1
## <chr>
## 1 A
## 2 B
## [1] "A" "B"
## [1] "A" "B"
# Best use $-operator to access
my_df$v_1
## [1] "A" "B"
# and change
my_df$v_1 <- c('Y', 'Z')
my_df$v_1
## [1] "Y" "Z"
10 / 15

Functions

Functions are objects that conduct operations on objects using objects. Functions have 3 elements:

  • Name: Used to call (execute) the function.
  • Argument(s): Objects that the function needs to do its job. May have default arguments.
  • Expression: R code that does the job.
# Defining a function that computes
# the mean or median
my_stat <- function(x, method = 'mean'){
# detect and run method
if(method == "mean") return(mean(x))
if(method == "median") return(median(x))
}
# Define object
my_vec <- c(1, 4, 6, 3, 7, 5, 12, 9)
# Runnning our functions
mean(x = my_vec)
## [1] 5.875
my_stat(x = my_vec, method = 'mean')
## [1] 5.875
my_stat(x = my_vec, method = 'median')
## [1] 5.5
11 / 15

Help

help files (and vignettes) are very useful.

Pay attention to...

  • Usage - shows function's use, its arguments and their defaults.
  • Arguments - explains arguments, and their type/class
  • Value - explains what the function returns
  • Examples -
# To access help files
?name_of_function
# search help files
??name_of_function

12 / 15

Factors

Factors are a special case of vector that can contain only predifined values so-called levels. Factors are rarely useful and sometimes dangerous, yet R will often coerce character to factor. Modern packages, include those included in the tidyverse tend to avoid factors. Otherwise R can be told excplicitly to avoid factors using options(stringsAsFactors = FALSE).

# create a factor
my_fact <- factor(c('A','B','C'))
my_fact
## [1] A B C
## Levels: A B C
# test type
typeof(my_fact)
## [1] "integer"
# dangerous behavior of factors pt. 1
my_fact <- factor(c('A','B','C'))
mean(as.integer(my_fact))
## [1] 2
# dangerous behavior of factors pt. 2
my_fact <- factor(c(1.32,4.52,.23))
as.numeric(my_fact) # ranks
## [1] 2 3 1
13 / 15

Object algebra

R has implementations of most operations of vector and matrix algebra and it is often desirable to make use of them to improve speed.
-

# create objects
my_mat <- matrix(1:9, ncol=3)
my_vec <- c(1:3)
# object times scale (also a vector)
my_mat * 5 ; my_vec * 5
## [,1] [,2] [,3]
## [1,] 5 20 35
## [2,] 10 25 40
## [3,] 15 30 45
## [1] 5 10 15
# create objects
my_mat <- matrix(1:9, ncol=3)
my_vec <- c(1:3)
# matrix multiplication
my_vec %*% my_mat
## [,1] [,2] [,3]
## [1,] 14 32 50
14 / 15

Practical

Link to practical

15 / 15

Objects

"Everything in R is an object"

John Chambers



  • R's objects are have content and attributes.
  • The content can be anything from numbers or strings to functions or complex data structures.
  • Attributes can be names, dimensions, and the class or type of the object, but other attributes are possible.
  • Practically all data objects are equipped with those three essential attributes.



2 / 15
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow