Statistics with R Basel R Bootcamp |

Steven Tash as a psi-test subject in Ghostbusters, from imdb.com

“The term psi denotes anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms. Two variants of psi are precognition (conscious cognitive awareness) and premonition (affective apprehension) of a future event that could not otherwise be anticipated through any known inferential process.”*Daryl J. Bem, professor emeritus, Cornell University*

In this practical, you will analyze Daryl Bem’s infamous psychological study on human’s *psi*-abilities and along the way practice everything around new statistics.

By the end of this practical you will know how to:

- P-hack
- Determine appropriate sample sizes
- Compute confidence intervals
- Run simple Bayesian analyses

- Open your
`BaselRBootcamp`

R project. It should already have the folders`1_Data`

and`2_Code`

. Make sure that the data files listed in the`Datasets`

section above are in your`1_Data`

folder

`# Done!`

- Open a new R script. At the top of the script, using comments, write your name and the date. Save it as a new file called
`newstats_practical.R`

in the`2_Code`

folder.

`# Done!`

- Using
`library()`

load the`tidyverse`

package (if you don’t have it, you’ll need to install it with`install.packages()`

)!

```
# Load packages necessary for this script
library(tidyverse)
library(pwr)
library(rstanarm)
library(BayesFactor)
```

- Using the following template, load the
`psi_exp1.csv`

and`psi_exp2.csv`

data into R and store them as a new object called`psi1`

and`psi2`

, respectively (Hint: Don’t type the path directly! Use the “tab” completion!).

```
# Load XXX.csv from the 1_Data folder
psi1 <- read_csv(file = "1_Data/psi_exp1.csv")
psi2 <- read_csv(file = "1_Data/psi_exp2.csv")
```

- Take a look at the first few rows of the data sets by printing them to the console.

```
# Print psi1 and psi2 object
psi1
```

```
# A tibble: 200 x 8
gender age stimulus_seeking hour_of_day condition depicted_gender
<chr> <int> <dbl> <int> <chr> <chr>
1 Female 20 3 18 erotic Female
2 Male 20 4.5 12 erotic Female
3 Female 19 2 12 erotic Female
4 Male 22 3 16 erotic Female
5 Female 22 3 16 erotic Female
6 Male 20 3 14 erotic Female
7 Male 19 3 13 erotic Female
8 Female 21 2 16 erotic Female
9 Male 19 2.5 12 erotic Female
10 Female 21 4 12 erotic Female
# … with 190 more rows, and 2 more variables: n_trials <int>,
# hit_rate <dbl>
```

`psi2`

```
# A tibble: 200 x 8
gender age stimulus_seeking hour_of_day condition depicted_gender
<chr> <int> <dbl> <int> <chr> <chr>
1 Female 20 3 18 erotic Female
2 Male 20 4.5 12 erotic Female
3 Female 19 2 12 erotic Female
4 Male 22 3 16 erotic Female
5 Female 22 3 16 erotic Female
6 Male 20 3 14 erotic Female
7 Male 19 3 13 erotic Female
8 Female 21 2 16 erotic Female
9 Male 19 2.5 12 erotic Female
10 Female 21 4 12 erotic Female
# … with 190 more rows, and 2 more variables: n_trials <int>,
# hit_rate <dbl>
```

- Use the the
`summary()`

function to print more details on the columns of the data sets.

```
# Show summaries for psi1 and psi2
summary(psi1)
```

```
gender age stimulus_seeking hour_of_day
Length:200 Min. :17.0 Min. :1.00 Min. :10.0
Class :character 1st Qu.:18.8 1st Qu.:2.00 1st Qu.:11.0
Mode :character Median :19.0 Median :2.50 Median :12.0
Mean :19.8 Mean :2.65 Mean :12.6
3rd Qu.:21.0 3rd Qu.:3.00 3rd Qu.:14.0
Max. :27.0 Max. :4.50 Max. :18.0
condition depicted_gender n_trials hit_rate
Length:200 Length:200 Min. :12 Min. :25.0
Class :character Class :character 1st Qu.:12 1st Qu.:44.4
Mode :character Mode :character Median :18 Median :50.0
Mean :18 Mean :51.5
3rd Qu.:24 3rd Qu.:58.3
Max. :24 Max. :83.3
```

`summary(psi2)`

```
gender age stimulus_seeking hour_of_day
Length:200 Min. :17.0 Min. :1.00 Min. :10.0
Class :character 1st Qu.:18.8 1st Qu.:2.00 1st Qu.:11.0
Mode :character Median :19.0 Median :2.50 Median :12.0
Mean :19.8 Mean :2.65 Mean :12.6
3rd Qu.:21.0 3rd Qu.:3.00 3rd Qu.:14.0
Max. :27.0 Max. :4.50 Max. :18.0
condition depicted_gender n_trials hit_rate
Length:200 Length:200 Min. :12 Min. :25.0
Class :character Class :character 1st Qu.:12 1st Qu.:44.4
Mode :character Mode :character Median :18 Median :50.0
Mean :18 Mean :51.5
3rd Qu.:24 3rd Qu.:58.3
Max. :24 Max. :83.3
```

- Use the
`View()`

function to view the entire data frames in new windows

```
# Show the full data for psi1 and psi2
View(psi1)
View(psi2)
```

- The first part of this practical consists of two p-hacking competitions.

1.1 The goal of both competitions is to achieve the smallest possible p-value by hacking the sh__ out of the Bem’s experimental data. To do this find up to three analyses that either show that people actually have *psi* abilities (i.e., that `hit_rate`

higher than chance) and that other variables can predict *psi* ability (i.e., `hit_rate`

, implying the existence of *psi*). There are no rules. You can use any test. You are even permitted to use only parts of the data (e.g., using `data %>% filter(condition)`

).

1.2 For each of the two competitions you are permitted to submit up to 3 models. You work alone or join forces with other participants. For information on the data and variables see the *Datasets* tab. The key criterion is `hit_rate`

. The person or group submitting the lowest p-value for a competition wins 🍫🍫🍫.

1.3 Submit your results using these links:

- In his paper, Bem’s first analysis is whether the hit rate for erotic pictures is on average larger than
`50%`

. For this condition he observed an average proportion of`53.13%`

. Determine, for experiment 1, what this means in terms of Cohen’s effect size*d*, which is calculated by dividing the average deviance from \(H_0\) (i.e., 50) by the standard deviation of the values the average is calculated from.

```
# extract erotic hit rate
hit_rate_erotic <- psi1$hit_rate[psi1$condition == "erotic"]
# calculate the deviance from H0
hit_rate_erotic_delta <- XX - 50
# calculate d
d <- mean(YY) / sd(YY)
```

```
# extract erotic hit rate
hit_rate_erotic <- psi1$hit_rate[psi1$condition == "erotic"]
# calculate the deviance from H0
hit_rate_erotic_delta <- hit_rate_erotic - 50
# calculate d
d <- mean(hit_rate_erotic_delta) / sd(hit_rate_erotic_delta)
```

- An effect size of
`d = .25`

is typically considered a small, but meaningful effect. Let us know take the perspective of rival researchers, who would like to replicate Bem’s effect. Specifically, we would like to conduct an experiment that has a small chance of false positives, e.g., \(\alpha = 0.05\) (`sig.level`

) and an equally low chance of missing the effect, i.e., \(\beta = .05\) implying \(1-\beta = power = .95\). How many observations would we have to make to be able to conduct such a test? Use the template below.

```
# N to test d=.25, alpha = .05, power = .95
pa <- pwr.t.test(d = XX,
sig.level = XX,
power = XX,
alternative = "greater")
pa
```

```
# N to test d=.25, alpha = .05, power = .95
pa <- pwr.t.test(d = .25,
sig.level = .05,
power = .95,
alternative = "greater")
pa
```

```
Two-sample t test power calculation
n = 347
d = 0.25
sig.level = 0.05
power = 0.95
alternative = greater
NOTE: n is number in *each* group
```

- The analysis shows that we would have to run a study with 347 individuals. You can also illustrate this using
`plot.power.htest()`

. Apply the function to`pa`

.

```
# sample size plot
plot.power.htest(pa)
```

- Using the plot, you can also already get an idea, how large the power, i.e., the probability of detecting an effect given that it is truly there to detect, was for Bem’s study. You can also again use the
`pwr.t.test()`

function. To do this, set`n = length(hit_rate_erotic)`

and`power = NULL`

. Call the result`pa_bem`

.

```
# Bems post-hoc power
pa_bem <- pwr.t.test(n = length(hit_rate_erotic),
d = .25,
sig.level = .05,
alternative = "greater")
pa_bem
```

```
Two-sample t test power calculation
n = 100
d = 0.25
sig.level = 0.05
power = 0.547
alternative = greater
NOTE: n is number in *each* group
```

- The previous analysis is called post-hoc (after the fact) power analysis, which should not be confused with an actual power analysis carried out before the data collection. In Bem’s case we find that the power was only
`.547`

meaning that there was almost only a 50-50 chance that an effect of the observed size could have been observed. Let’s now consider what would happen, if the effect was actually very small, say`d = .1`

. What large would Bem’s post-hoc power have been and how large a sample size was necessary for`power = .95`

.

```
# Bems post-hoc power
pa_bem.1 <- pwr.t.test(n = length(hit_rate_erotic),
d = .1,
sig.level = .05,
alternative = "greater")
pa_bem.1
```

```
Two-sample t test power calculation
n = 100
d = 0.1
sig.level = 0.05
power = 0.174
alternative = greater
NOTE: n is number in *each* group
```

```
# Sample
pa.1 <- pwr.t.test(d = .1,
sig.level = .05,
power = .95,
alternative = "greater")
pa.1
```

```
Two-sample t test power calculation
n = 2165
d = 0.1
sig.level = 0.05
power = 0.95
alternative = greater
NOTE: n is number in *each* group
```

- Bem’s post-hoc power would have been a meager 17.4%, while the N necessary for a conclusive study lies upwards of 2000 individuals. This means that Bem either knew we would be finding a substantive effect, or he actually engaged in pretty poor study planning. Go explore, can you determine the post-hoc power and n necessary your analyses? Use
`pwr.r.test()`

for correlation tests and`pwr.chisq.test()`

for chi-square tests. Unfortunately, there are no functions for regression models.

- Confidence intervals are another way of using a significance test that places focus on the uncertainty inherent in the result. The basic idea is that rather than dividing an estimate by its standard error we assess the range spanned by the standard error to the left and right of the estimate. Let’s begin by computing a standard paired t-test for the hit-rate using the template below and storing the result in an object called
`test()`

.

```
# extract hit rates
hit_rate_erotic <- psi1$hit_rate[psi1$condition == "erotic"]
hit_rate_neutral <- psi1$hit_rate[psi1$condition != "erotic"]
# compute t-test
test <- t.test(x = XX,
y = YY,
paired = TRUE)
```

```
# extract hit rates
hit_rate_erotic <- psi1$hit_rate[psi1$condition == "erotic"]
hit_rate_neutral <- psi1$hit_rate[psi1$condition != "erotic"]
# compute t-test
test <- t.test(x = hit_rate_erotic,
y = hit_rate_neutral,
paired = TRUE)
```

- Next inspect the names of the
`test`

object using`names()`

.

`names(test)`

```
[1] "statistic" "parameter" "p.value" "conf.int" "estimate"
[6] "null.value" "alternative" "method" "data.name"
```

- The you will have observed that the t-test object actually already contains a
`conf.in`

object. Access it using`$`

and print it to the console.

`test$conf.int`

```
[1] -0.228 6.839
attr(,"conf.level")
[1] 0.95
```

- Let us now try to recreate the confidence interval. The first element we need for this is the average difference, which is simply the average of the differences between
`hit_rate_erotic`

and`hit_rate_neutral`

. First, calculate the differences and store them as`hit_rate_diff`

. Then calculate the mean of`hit_rate_diff`

and call it`hit_rate_diff_mean`

.

```
# differences
hit_rate_diff <- hit_rate_erotic - hit_rate_neutral
# mean differences
hit_rate_diff_mean <- mean(hit_rate_diff)
```

- The second element is the standard error. In case of paired samples this is very easy to calculate. The standard error is the standard deviation (
`sd()`

) of the differences divided by the square root of the number of differences (`sqrt(length())`

). Store the standard error as`hit_rate_diff_se`

.

```
# differences
hit_rate_diff_se <- sd(hit_rate_diff) / sqrt(length(hit_rate_diff))
```

- The final element is the value in the t-distribution that corresponds to a certain confidence interval width, e.g., 95% or 99%. If the goal is to determine a 95% confidence interval then we need to identify the t-values that encapsulate this proportion of t-values in the t-distribution, which is the t-value at which 2.5% of t-values are smaller, i.e., \(t_{2.5\%}\) and the t-value at which 97.5% are smaller, i.e., \(t_{97.5\%}\). See the figure below. Locate these values in the figure below.

- We can calculate the precise t-values corresponding to certain probabilities using the cumulative distribution function of t
`qt(p = XX, df = YY)`

. To use it, you must provide the proportion`p`

of t-values that should be smaller than the target value and the degrees of freedom`df`

, which in this case must be set to`n-1`

. Determine \(t_{2.5\%}\) and \(t_{97.5\%}\) and store them as`t.25`

and`t97.5`

.

```
# t-values for p = 2.5% and p = 97.5%
t.25 <- qt(p = .025, df = length(hit_rate_diff) - 1)
t97.5 <- qt(p = .975, df = length(hit_rate_diff) - 1)
```

- Compare the two t-values. Anything surprising? Given the symmetric nature of the t-distribution the two values are their respective mirror image. Ok, now you have all the ingredients that you need to construct a confidence interval. To calculate the lower bound, you want to move exactly
`t.25`

times the standard error to the left of the mean hit rate difference. And to calculate the bound, you want to move exactly`t97.5`

times the standard error to the right of the mean hit rate difference. Do this and name the lower and upper bounds`hit_rate_lowerb`

and`hit_rate_upperb`

.

```
# confidence limits
hit_rate_lowerb <- hit_rate_diff_mean + t.25 * hit_rate_diff_se
hit_rate_upperb <- hit_rate_diff_mean + t97.5 * hit_rate_diff_se
```

Now compare your calculated values against the confidence interval determined by the t-test. Are they the same? The confidence interval is many ways the better alternative to the classic inferential test because it prefers more information. A test is just significant or not. The confidence interval reveals this information too by including the 0 or not, but it also reveals the oftentimes considerable uncertainty inherent in our estimate, which is otherwise easily forgotten.

Play around with the confidence intervals. How do the bounds change, when you choose higher (e.g., 99%) or lower (90%) percentages.

- Bayesian statistics have become readily available in R mainly through two R packages
`rstanarm`

and`Bayesfactor`

. Both packages are great and allow you to do almost anything, but they do follow a slightly different philosophy. Let’s begin with`rstanarm`

. Use the template below to run a regression predicting`hit_rate`

by`gender`

and`condition`

.

```
# Bayesian regression predicting hit_rate by gender and condition
bm1 <- stan_glm(formula = hit_rate ~ gender + condition,
data = psi1)
```

```
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.000107 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.07 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.121502 seconds (Warm-up)
Chain 1: 0.114164 seconds (Sampling)
Chain 1: 0.235666 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 3.2e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.32 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.119953 seconds (Warm-up)
Chain 2: 0.11033 seconds (Sampling)
Chain 2: 0.230283 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 2e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.142752 seconds (Warm-up)
Chain 3: 0.114845 seconds (Sampling)
Chain 3: 0.257597 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 2e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.121122 seconds (Warm-up)
Chain 4: 0.108948 seconds (Sampling)
Chain 4: 0.23007 seconds (Total)
Chain 4:
```

- Print the object and inspect the output. You’ll see it looks different than the output of a regular regression. Focusing only on the part after the separating hyphens, there are three main parts:

The first part shows the coefficients and their standard deviation. Different from a regular regression output, we do not receive any statistics or significance values. Moreover estimates are labeled Median and MAD_SD. This is because of the fact that in Bayesian statistics, we can more or less choose which centrality measure to compute from the posterior distribution.

`rstanarm`

chooses the median, because the median is typically more robust.The second part labeled

*Auxiliary parameter(s)*lists another set of estimates (in this case only 1), which had to be computed in order to be able to fit the model. In this case this is a single standard deviation for the residual distribution (here you can see the importance of the homoscedascity assumption. Without it a lot more sigmas would have to be estimated).The third part labeled

*Sample avg. posterior predictive distribution of y*shows the mean value predicted by the model, which is, more or less, the average of the fitted values.

- In order to make statistical decisions, Bayesian statistics commonly relies on intervals, which in Bayesian statistics are called
*credible interval*rather than*confidence interval*. Credible intervals truly provide the ranges that encompass the true value with 95% probability, which cannot strictly be said about confidence intervals. You can compute these intervals using the`posterior_interval()`

function on the`bm1`

object.

```
# Bayesian regression predicting hit_rate by gender and condition
posterior_interval(bm1)
```

```
5% 95%
(Intercept) 46.710 51.47
genderMale -1.393 4.21
conditionerotic 0.477 6.07
sigma 11.083 13.04
```

- Now let’s run the same analysis using the
`BayesFactor`

package using the`regressionBF`

function.

```
# Bayesian regression predicting hit_rate by gender and condition
psi1$gender_dummy <- as.numeric(psi1$gender == 'Male')
psi1$condition_dummy <- as.numeric(psi1$condition == "erotic")
bm2 <- regressionBF(formula = hit_rate ~ gender_dummy + condition_dummy,
data = psi1)
```

Print

`bm2`

and inspect the output. You’ll see it’s very different. The`BayesFactor`

focuses more on model comparisons than parameter estimation. Accordingly, the output informs you about a model comparison, in this case, between the regression model provided and a model only containing an intercept as a predictor, the`Intercept only`

model. The results are what is known*BayesFactor*s. The`BayesFactor`

quantifies the amount of evidence that the data provides in favor of the model in question, relative to the comparison model. Values larger than 1 mean that the data favors the model in question, although only values above 10 are considered meaningful. Values lower than 1 indicate that the comparison model is actually more likely. The output presents results independently for each predictor and their combination. So what is it in this case? Are the predictors any good.The previous result illustrates one of several major benefits of Bayesian statistics and that is to be able to specify the evidence in favor of the Null-hypothesis. In general, everything can be done with either package. Go explore.

```
# power analysis -----------
# N needed for one-sided test with a mean difference of
# .3 standard deviations, a false positive error rate (sig.level)
# of .05, and power of .95
pwr.t.test(d = .3,
sig.level = .05,
power = .95,
alternative = "greater")
# Power obtained for one-sided test with mean difference of
# .3 standard deviations, a false positive error rate (sig.level)
# of .05, and a sample size of 100
pwr.t.test(n = 100,
d = .3,
sig.level = .05,
power = .95,
alternative = "greater")
# confidence intervals -----------
# split data (not strictly necessary)
mpg_suv <- mpg %>% filter(class == 'suv')
mpg_compact <- mpg %>% filter(class == 'compact')
# mean difference and pooled standard deviation
mpg_diff = mean(mpg_suv$hwy) - mean(mpg_compact$hwy)
mpg_diff_sd = sqrt((var(mpg_suv$hwy) + var(mpg_compact$hwy))/2)
df = length(mpg_suv$hwy) + length(mpg_compact$hwy) - 2
mpg_diff_se = mpg_diff_sd / sqrt(df)
# upper and lower bounds
mpg_diff + qt(.025, df) * mpg_diff_se
mpg_diff + qt(.975, df) * mpg_diff_se
# both at the same time
mpg_diff + qt(.975, df) * mpg_diff_se * c(-1,1)
# Bayesian regression -----------
# using rstanarm
bm1 <- stan_glm(hwy ~ displ, data = mpg)
# rstanarm credible intervals
posterior_interval(bm1)
# using BayesFactor
bm2 <- regressionBF(hwy ~ displ, data = mpg)
# sample from posterior
posterior(bm2, iterations = 1000)
```

File | Rows | Columns |
---|---|---|

psi_exp1.csv | 200 | 7 |

psi_exp2.csv | 150 | 5 |

The data sets stem from actual (para-) psychological investigation of people’s ability to “Feel the future”. The infamous study published in a respected Social Psychology journal was one of the main reasons that triggered the replication crisis and the associated efforts for better scientific practice in Psychology and beyond. The data sets contain the data of the first two experiments. Experiment 1 investigated whether people can successfully (better than chance) predict behind which of two occluding curtains a picture is hiding as a function of, among other things, whether the picture has erotic content. Experiment 2 is an attempt to replicate study 2. Both studies also measures how stimulus seeking (a variant of extraversion) individuals were, to see whether this influences individuals *psi* ability.

Name | Description |
---|---|

`gender` |
The gender of the participant. |

`age` |
The age of the participant. |

`stimulus_seeking` |
The degree of stimulus-seeking exhibited by the participant. |

`hour_of_day` |
The hour of day at which the participant was tested. |

`condition` |
The study condition: “erotic” or “control” (only psi1) |

`depicted_gender` |
The gender on the presented photo: “Female” or “Male” (only psi1) |

`n_trials` |
The number of choices made. |

`hit_rate` |
%-correct choices (with regard to showing psi). |

Package | Installation |
---|---|

`tidyverse` |
`install.packages("tidyverse")` |

`pwr` |
`install.packages("pwr")` |

`rstanarm` |
`install.packages("rstanarm")` |

Function | Package | Description |
---|---|---|

`pwr.t.test` |
`pwr` |
t-test power analyses |

`plot.power.htest` |
`pwr` |
Plot power analyses |

`qt` , `anorm` , `qF` , etc. |
`stats` |
Functions needed to determine confidence intervals |

`stan_lm` |
`rstanarm` |
Bayesian regression using `rstanarm` |

`posterior_interval` |
`rstanarm` |
Bayesian credible interval using `rstanarm` |

`regressionBF` |
`BayesFactor` |
Bayesian regression using `BayesFactor` |

`posterior` |
`BayesFactor` |
Sample from the posterior using `BayesFactor` |

Find vignettes to these packages: pwr, rstanarm, BayesFactor