Slides

Here a link to the lecture slides for this session: LINK

Overview

In this practical you’ll learn how to use R’s basic data objects and how to use and write functions. By the end of this practical you will know how to:

Create data objects of different kinds.
How to inspect objects.
Change object types.
Access elements from data objects.
Apply functions
Write simple functions.

Functions

Here are the main functions for object creation.

Function	Description
`c(), rep(), seq(), numeric(), etc.`	Create a vector
`matrix(), cbind()`	Create a matrix
`data.frame()`	Create a data.frame
`list()`	Create a list

Here are the main functions for object inspection.

Function	Description
`head(), tail()`	Inspect first or last elements of object
`str()`	Inspect the structure of the data object
`View()`	To access an Excel like data interface

Here are the main functions for object selection.

Function	Description
`[ ]`	Single brackets: Select individual elements from `vector`
`[[ ]]`	Double brackets: Select element/variable in list or `data_frame`
`$`	Dollar: Select named element/variable from list or `data_frame` (preferred)

Here is the function to create a function.

Function	Description
`function()`	Creates a function

Tasks

Vectors

Create a numeric (double), a character, and a logical vector each with 10 elements using c() and assign them as dbl_vec, log_vec, and chr_vec (using the assignment operator, i.e., name <- c()). For example to save the three numbers 1, 2, and 3 in a vector called three_numbers execute three_numbers <- c(1, 2, 3).
Test the type and length of the vectors using the functions typeof() and length(). To run the functions simply place the name of the individual vectors in the parenthesis and then send to the console to execute. Remember you can send any line in your editor to the console using cmd + enter (MAC) or cntrl + enter (Windows).
Expand each of the vectors using rep() to two times their length. rep() is a general purpose repeat function for vectors. It takes (at least) two arguments: x the vector and times, which can be a single integer, e.g., 2. Take a look at the functions help file using ?rep. To expand the vectors, you need to apply rep() and assign the result back to your vector object, i.e., object <- rep(object, argument).

Data frames

Use your three vectors to create a 3xN data_frame, where N is the length of each of the vectors. To do this, first load the tibble package, which is part of the tidyverse. To create a data frame use the data_frame() function. Look at the help file (?data_frame & ?data_frame). The first argument to data_frame() is ... which is a placeholder for as many objects as you want to include in the data frame. To create a data_._frame from your three vectors pass on the three vectors separated by comma, e.g., data_frame(vector_1, vector_2, vector_3), and assign the result to, e.g., my_df. Now that you created your data.frame, look at the data frame using head(). Observe that the variables are included in the order that you provided them in. Next, test its type (typeof) and dimension (dim()). Do they make sense?
Now, practice accessing elements from a data.frame. Try to select the first element using (1) double brackets and index, e.g., [[1]], (2) using double brackets and name [['dbl_vec']], and our preferred way using the dollar operator and name, e.g., $dbl_vec. To inspect the names of the elements use names(my_df) or str(my_df), which provides a richer overview over your data object.
Now, try what happens when you use single brackets instead of double brackets, when accessing elements from a data.frame. Try, e.g., my_df["dbl_vec"], and name, e.g., my_df[["dbl_vec"]] and test their types using typeof(). What kind of object do you get in the first and second case? Understand why we prefer the $-operator?

Coercion

As you have heard in the lecture, vectors (or objects in general) can easily be coerced into different data types. Try to coerce each of the vectors into every other format using as.numeric(), as.character(), as.logical(). Can every type be coerced into every other? Which coercion does always work?

Logical comparisons

An important tool of working with data are logical comparisons. Logical comparisons can be used to conveniently select parts of the data. They can also be used to conduct checks throughout script. For instance, you can use logical comparisons to compare whether the names of, e.g., my_df are equal c("dbl_vec", "chr_vec", "log_vec"). This can be done using the is-equal-to operator ==, e.g., vec_1 == vec_2. In the case of vectors, logical comparisons iterate through the vectors one by one and evaluate whether the elements at the present location are equal. The result is a logical vector of the same length as the two objects involved in the logical comparison. To evaluate whether all elements are equal one can conveniently use the all() function. Try compare the names of my_df to c(dbl_vec", "chr_vec", "log_vec") and test whether all comparisons resulted in TRUE.
A second important use for logicals is inside of brackets to access objects. Brackets ([] or [[]]) can take logical vectors as an argument (provided that the dimensions match to the to be accessed data object). E.g., c(1, 2, 3)[c(TRUE, FALSE, TRUE)] returns the first and the third element of the vector, that is it returns those elements for which the logical vector is TRUE. Try to use this now on your three vectors. Use log_vec to subset the elements in dbl_vec and chr_vec.
Logical vectors become especially useful when they are used to subset created from comparisons. For instance, you may be interested in retrieving all values in chr_vec for which dbl_vec is larger than some value. This can be easily accomplished by coercing dbl_vec to a logical vector using >. Consider c(1, 4, 7, 2) > 3. Try this for chr_vec and dbl_vec with an appropriate cut-off value.
Another convenient aspect of working with logical vectors is that they can be conveniently combined using the logical AND operator & and the logical OR operator |. Run c(1, 4, 7, 2) > 3 & c(1, 4, 7, 2) < 6 and c(1, 4, 7, 2) > 3 | c(1, 4, 7, 2) < 6 and observe the results. Try now combining the logical vector that results from coercing dbl_vec to logical and log_vec to subset chr_vec.

Lists

Use your three vectors to create a list using list(). list() works exactly as c() or data_frame(). Call the object my_list and inspect it using str(). Compare the output of str() to the output for str(my_df). What is different, and why?
Coerce my_list into a data frame using as_data_frame() and assign it to my_df_from_list. Inspect again using str(). Rename the columns to the original names of the vectors. To do this assign names(my_df) to names(my_df_from_list). Confirm that it worked using either names() or str().

Functions

So far you have used several functions like c(), typeof(), etc. Now, you are going to practice writing your own little functions. Let’s start with programming a function that calculates the arithmetic mean using the functions sum(), which calculates the sum of all values in a vector and length(), which determines the length of vectors (and other objects). The basic form of a function is: my_fun <- function(argument) {expression}. The challenge in writing a function is to determine the argument(s), which are the the function’s input, and the expression, which defines the function’s job. In this case, both are relatively straightforward to define. The functions argument is any numeric vector and the functions expression should be the vectors sum divided by its length, i.e., sum(vector)/length(vector). Together: my_mean <- function(vector) {sum(vector) / length(vector)}. You may ask now how does R know that vector is supposed to be of type numeric? It does not! R will attempt to execute the function with anything you through at it. Try it out! Define your my_mean function and try it out for your double and character vectors from above.
The character vector produced an error, right? The reason is that R does not interpret the names you use for the arguments. That is, R does not understand that an argument named vector is supposed to be a vector. For R these names are merely placeholders. In the background, R will use these names to pass on the object provided at the respective location. I.e., in the above case, R will take your vector, may it be dbl_vec or chr_vec, assign it to the name vector, and pass this new object on to the expression. This means that you can choose anything as the argument names. The only thing you need to ensure is that you work with the names of the arguments in the expression, rather than the object’s real names. Try redefining the your my_mean function using a different argument name, e.g., x.

Note: R literally creates new objects by assigning the object you provide to a function to its argument names. This means that the objects used inside the function are duplicates of the objects that you provide and not the objects themselves. This is why functions never change an object unless you assign the function’s output back to the object or you explicitly tell R to change the object inside the function using the double arrow assignment operator <<-.

Non-flat objects

In working with R, you often encounter objects that have a non-flat, nested structure. This means that there is more than one level of elements. Such objects are always lists. Lists are the only object type that can store any other object, including themselves. That is, you can have a list of lists or a list of data.frames. Working with non-flat, nested objects is easier than it may seem. To access elements from such an object simply combine several dollar $ operators (or double bracket [[]]) to descend into the objects structure. Let’s take a look.

Create a list containing my_df and your three vectors (for a total of four elements). Try to verify (using the list structure) that each of the columns of my_df is equal the respective vector. To access the columns in my_df you have to combine selectors e.g., my_list$name$name (the preferred way).

Additional reading

For more details on all steps of data analysis check out Hadley Wickham’s R for Data Science.
For more advanced content on objects check out Hadley Wickham’s Advanced R.
For more on pirates and data analysis check out the respective chapters in YaRrr! The Pirate’s Guide to R YaRrr! Chapter Link

Practical: Data objects and functions

BaselRBootcamp April 2018

Slides

Overview

Functions

Tasks

Vectors

Data frames

Coercion

Logical comparisons

Lists

Functions

Non-flat objects

Additional reading