from cloudtweaks.com
R has 3 main data objects...
list
data_frame
list
vectors
list
1 - Can list
s, data_frame
s, vector
s, etc.
2 - Are often used for
3 - Have
4 - Elements can be names()
or str()
.
5 - Elements are (typically) $
.
list
: Select element using $
# regressionreg_model <- lm(height ~ sex + age, data = baselers)reg_results <- summary(reg_model)# get element namesnames(reg_results)
## [1] "call" "terms" ## [3] "residuals" "coefficients"## [5] "aliased" "sigma" ## [7] "df" "r.squared"
# select element using $reg_results$coefficients
## Estimate t value## (Intercept) 164.171266 499.5339## sexmale 13.993699 66.4724## age -0.003753 -0.5819
data_frame
1 - Are list
s containing vector
s of equal length
2 - Contain vector
s of different types: numeric
, character
, etc.
3 - Have named elements.
4 - Elements can be names()
, str()
, print()
, View()
, or skimr::skim()
.
5 - Elements are (typically) $
.
6 - Come in different flavors: data.frame()
, data.table()
, tibble()
.
# inspect baselers via printbaselers
## # A tibble: 10,000 x 20## id sex age height weight income## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>## 1 1 male 44 174. 113. 6300## 2 2 male 65 180. 75.2 10900## 3 3 fema… 31 168. 55.5 5100## 4 4 male 27 209 93.8 4200## 5 5 male 24 177. NA 4000## education confession children## <chr> <chr> <dbl>## 1 SEK_III catholic 2## 2 obligato… confessio… 2## 3 SEK_III <NA> 2## 4 SEK_III catholic 2## 5 SEK_III catholic 1## # … with 9,995 more rows, and 11 more## # variables
# View dataframe in a new windowView(baselers)
$
# Access age column from baselersbaselers$age
## [1] 44 65 31 27 24 63 71 41 43 31 42 31## [13] 38 49 39 54 78 62 88 74
# Access education column from baselersbaselers$education
## [1] "SEK_III" ## [2] "obligatory_school"## [3] "SEK_III" ## [4] "SEK_III" ## [5] "SEK_III" ## [6] "SEK_III" ## [7] "SEK_III" ## [8] "SEK_III" ## [9] "apprenticeship" ## [10] "SEK_II"
$
# Divide income by 1000baselers$income <- baselers$income / 1000# inspect baselersbaselers
## # A tibble: 10,000 x 20## id sex age height weight income## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>## 1 1 male 44 174. 113. 6.3## 2 2 male 65 180. 75.2 10.9## 3 3 fema… 31 168. 55.5 5.1## 4 4 male 27 209 93.8 4.2## 5 5 male 24 177. NA 4 ## education confession children## <chr> <chr> <dbl>## 1 SEK_III catholic 2## 2 obligato… confessio… 2## 3 SEK_III <NA> 2## 4 SEK_III catholic 2## 5 SEK_III catholic 1## # … with 9,995 more rows, and 11 more## # variables
vector
1 - R's
2 - Can contain only a
3 - Data types
numeric
character
logical
TRUE
or FALSE
...
NA
[ ]
# extract vector containing ageage <- baselers$ageage
## [1] 44 65 31 27 24 63 71 41 43
# select valueage[2]
## [1] 65
# change valueage[2] <- 100age
## [1] 44 100 31 27 24 63 71 41 43
Find more info on indexing here.
numeric
numeric
vectors are used to store numbers and only numbers.
baselers$age
## [1] 44 65 31 27 24 63 71 41 43
# evaluate classclass(baselers$age)
## [1] "numeric"
# is age numeric?is.numeric(baselers$age)
## [1] TRUE
character
character
vectors are used to store data represented by
You can always recognise character vectors by
baselers$sex
## [1] "male" "male" "female" "male" ## [5] "male" "male" "male" "female"
baselers$education
## [1] "SEK_III" ## [2] "obligatory_school"## [3] "SEK_III" ## [4] "SEK_III"
character
character
vectors are used to store data represented by
You can always recognise character vectors by
baselers$age
## [1] 44 65 31 27 24 63 71 41
# convert age to characteras.character(baselers$age)
## [1] "44" "65" "31" "27" "24" "63" "71"## [8] "41" "43"
logical
logical
vector are used to logical
are typically created from other vectors via
# which sex values are male?baselers$sex == "male"
## [1] TRUE TRUE FALSE TRUE TRUE TRUE## [7] TRUE FALSE
# which ages are less than 30?baselers$age < 30
## [1] FALSE FALSE FALSE TRUE TRUE FALSE## [7] FALSE FALSE FALSE
logical
logical
vector are used to logical
are typically created from other vectors via
Logical operators
==
<
>
<=
>=
&
&&
|
||
1 - Most typical file format.
2 - Requires
readr
readr
is a tidyverse
package that provides convenient functions to read in flat (non-nested) data files into data frames (tibble
s to be precise):
# Importing data from a filedata <- read_csv(file, ...) # comma-delimiteddata <- read_csv2(file, ...) # semicolon-delimeteddata <- read_delim(file, ...) # arbitrary-delimited# Writing a data frame to a filewrite_csv(data_object, path, ...) # comma-delimitedwrite_delim(data_object, path, ...) # arbitrary-delimited
1 - Identify the file path using the
2 - Initiate auto-complete and browse through the folder structure by placing the cursor between two quotation marks and using the
3 - Auto-complete begins with the project folder -
1 -
2 -
baselers.csv
1 -
2 -
# Read with explicit column namesbaselers <-read_delim(file = ".../baselers.csv", delim = c(","))
baselers.csv
1 - readr
- functions typically expect the
2 - If no column names are available, use the col_names
-argument
# Read with explicit column namesbaselers <- read_csv(file = ".../baselers.csv", col_names = c("id", "age", ...))
baselers.csv
Reading in data, readr
infers the type of data
# Read baselersread_csv(file = "1_Data/baselers.csv")
## Parsed with column specification:## cols(## .default = col_double(),## sex = col_character(),## education = col_character(),## confession = col_character(),## fasnacht = col_character(),## eyecor = col_character()## )
## See spec(...) for full column specifications.
## # A tibble: 10,000 x 20## id sex age height weight income## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>## 1 1 male 44 174. 113. 6300## 2 2 male 65 180. 75.2 10900## 3 3 fema… 31 168. 55.5 5100## 4 4 male 27 209 93.8 4200## 5 5 male 24 177. NA 4000## education confession children## <chr> <chr> <dbl>## 1 SEK_III catholic 2## 2 obligato… confessio… 2## 3 SEK_III <NA> 2## 4 SEK_III catholic 2## 5 SEK_III catholic 1## # … with 9,995 more rows, and 11 more## # variables
baselers.csv
Incorrect data types can be fixed. Typically this involves:
1 - NA
stringsna
-argument.type_convert
# Read baselersbaseslers <- read_csv(file = ".../baselers.csv", na = c('NA'))# Try to fix incorrect data typesbaselers <- type_convert(baselers)
baselers.csv
R provides
readr
# read fixed width files (can be fast)data <- read_fwf(file, ...)# read Apache style log filesdata <- read_log(file, ...)
haven
# read SAS's .sas7bat and sas7bcat filesdata <- read_sas(file, ...)# read SPSS's .sav filesdata <- read_sav(file, ...)# etc
readxl
# read Excel's .xls and xlsx filesdata <- read_excel(file, ...)
# Read Matlab .mat filesdata <- R.matlab::readMat(file, ...)# Read and wrangle .xml and .htmldata <- XML::xmlParseParse(file, ...)# from package jsonlite: read .json filesdata <- jsonlite::read_json(file, ...)
from cloudtweaks.com
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |