aov()
TukeyHSD()
library(yarrr) # Load yarrr for the pirates dataframe
# ------------------
# ONE WAY ANOVA
# ------------------
# Do pirates from different colleges have different beard lengths?
col_beard_aov <- aov(formula = beard.length ~ college,
data = pirates)
# What is in the object?
class(col_beard_aov) # Result is of class 'aov' and 'lm'
names(col_beard_aov) # All of the named elements in the object
# Look at specific elements of the object
col_beard_aov$coefficients # Coeffients
col_beard_aov$residuals # Residuals (should be normally distributed)
# Look at results
summary(col_beard_aov) # We do find a significant effect of college on beard length!
TukeyHSD(col_beard_aov) # Post-hoc tests
# ------------------
# TWO WAY ANOVA
# ------------------
# Is there a relationship between sex and headband on weight
sexhead_weight_aov <- aov(formula = weight ~ sex + headband,
data = pirates)
summary(sexhead_weight_aov) # There is an effect of sex, but not headband
TukeyHSD(sexhead_weight_aov) # Post-hoc tests
# ------------------
# TWO WAY ANOVA WITH INTERACTIONS
# ------------------
# Is there an interaction between sex and headband on weight
sexhead_int_weight_aov <- aov(formula = weight ~ sex * headband, # Use * instead of + for interactions!
data = pirates)
summary(sexhead_int_weight_aov) # Nope, no interaction
# ------------------
# More fun!
# ------------------
# Plot an ANOVA object to visualize several statistics
plot(col_beard_aov)
# Use the papaja package to print apa style results
devtools::install_github("crsh/papaja", include_vignettes = TRUE) # Install the papaja package from github
library("papaja") # Load the papaja package
# Print apa style conclusions from aov objects
apa_print(col_beard_aov)
apa_print(sexhead_int_weight_aov)
# Easily plot group effects with yarrr::pirateplot()
library(yarrr)
pirateplot(formula = weight ~ sex + headband,
data = pirates)
# Or use ggplot2
library(tidyverse) # Contains ggplot2 and dplyr
# First, calculate aggregate data to be plotted
pirates_agg <- pirates %>%
group_by(headband, sex) %>%
summarise(
weight_mean = mean(weight),
weight_lb = t.test(weight)$conf.int[1],
weight_ub = t.test(weight)$conf.int[2]
)
ggplot(data = pirates_agg,
aes(x = headband, y = weight_mean, fill = sex)) +
geom_bar(stat = "identity", position = position_dodge(0.9), col = "white") +
geom_errorbar(aes(ymax = weight_lb,
ymin = weight_ub),
position = position_dodge(0.9),
width = 0.25) +
labs(y = "Treasure chests found")
In this WPA, you will analyze data from a (again…fake) study on attraction. In the study, 500 heterosexual University students viewed the Facebook profile of another student (the “target”) of the opposite sex. Based on a target’s profile, each participant made three judgments about the target - intelligence, attractiveness, and dateability. The primary judgement was a dateability rating indicating how dateable the person was on a scale of 0 to 100.
The data are located in a tab-delimited text file at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/facebook.txt. Here is how the first few rows of the data should look:
head(facebook)
## session sex age haircolor university education shirtless intelligence
## 1 1 m 22 blonde 3.Geneva 2.Bachelors 1.No 2.medium
## 2 1 m 24 brown 2.Zurich 3.Masters 2.Yes 2.medium
## 3 1 f 22 brown 3.Geneva 2.Bachelors 2.Yes 2.medium
## 4 1 f 23 brown 1.Basel 3.Masters 1.No 1.low
## 5 1 m 22 blonde 1.Basel 2.Bachelors 1.No 3.high
## 6 1 m 28 brown 1.Basel 4.PhD 1.No 3.high
## attractiveness dateability
## 1 1.low 45
## 2 3.high 63
## 3 2.medium 58
## 4 3.high 86
## 5 3.high 96
## 6 3.high 90
The data file has 500 rows and 10 columns. Here are the columns
session
: The experiment session in which the study was run. There were 50 total sessions.
sex
: The sex of the target
age
: The age of the target
haircolor
: The haircolor of the target
university
: The university that the target attended.
education
: The highest level of education obtained by the target.
shirtless
: Did the target have a shirtless profile picture? 1.No v 2.Yes
intelligence
: How intelligent do you find this target? 1.Low, 2.Medium, 3.High
attractiveness
: How physically attractive do you find this target? 1.Low, 2.Medium, 3.High
dateability
: How dateable is this target? 0 to 100.
Open your class R project. This project should have (at least) two folders, one called data
and one called R
. Open a new script and enter your name, date, and the wpa number at the top. Save the script in the R
folder in your project working directory as wpa_7_LastFirst.R
, where Last and First are your last and first names.
The data are stored in a tab–delimited text file located at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/facebook.txt. Using read.table()
load this data into R as a new object called facebook
facebook <- read.table(file = "https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/facebook.txt",
header = TRUE,
sep = "\t")
head()
and View()
functions to make sure it loaded correctly.head(facebook)
names()
and str()
functions, look at the names and structure of the dataframe to make sure everything looks ok. If the data look strange, you did something wrong with read.table()
diagnose the problem!str(facebook)
write.table()
, save a local copy of the facebook data to a text file called facebook.txt
in the data folder of your project. Now, you’ll always have access to the data.write.table(facebook,
file = "data/facebook.txt",
sep = "\t")
For each question, conduct the appropriate ANOVA by creating an object called tX_aov
, where X is the task number. Look at the results using summary()
. Then, write the conclusion in APA style. To summarize an effect in an ANOVA, use the format F(XXX, YYY) = FFF, p = PPP, where XXX is the degrees of freedom of the variable you are testing, YYY is the degrees of freedom of the residuals, FFF is the F value for the variable you are testing, and PPP is the p-value (if the p-value is less than .01, just write p < .01).
If the p-value of the ANOVA is less than .05, conduct post-hoc tests. If you are only testing one independent variable, write APA conclusions for the post-hoc test. If you are testing more than one independent variable in your ANOVA, you do not need to write APA style conclusions for post-hoc tests.
For example, here is how I would analyze and answer the question: “Was there an effect of diets on Chicken Weights?”"
# ANOVA on Chicken Weights
# IV = Diet, DV = weight
# ANOVA
t0_aov <- aov(formula = weight ~ Diet,
data = ChickWeight)
# Look at summary results
summary(t0_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 3 155863 51954 10.81 6.43e-07 ***
## Residuals 574 2758693 4806
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA was significant (p < .01), so I'll conduct post-hoc tests
# Tukey post-hoc tests
TukeyHSD(t0_aov)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ Diet, data = ChickWeight)
##
## $Diet
## diff lwr upr p adj
## 2-1 19.971212 -0.2998092 40.24223 0.0552271
## 3-1 40.304545 20.0335241 60.57557 0.0000025
## 4-1 32.617257 12.2353820 52.99913 0.0002501
## 3-2 20.333333 -2.7268370 43.39350 0.1058474
## 4-2 12.646045 -10.5116315 35.80372 0.4954239
## 4-3 -7.687288 -30.8449649 15.47039 0.8277810
# Conclusion
# There was a significant main effect of diets on chicken weights (F(3, 574) = 10.81, p < .01). Pairwise Tukey HSD tests showed significant differences between diets 1 and 3 (diff = 40.30, p < .01) and diets 1 and 4 (diff = 32.62, p < .01). All other pairwise differences were not significant at the 0.05 significance threshold.
t6_aov <- aov(formula = dateability ~ university,
data = facebook)
summary(t6_aov)
TukeyHSD(t6_aov)
#Answer: There was a significant main effect of university on dateability (F(2, 497) = 5.26, p < .01). Pairwise Tukey HSD tests showed significant differences between Geneva and Basel (diff = -9.23, p < .01). All other pairwise differences were not significant at the 0.05 significance threshold.*
t7_aov <- aov(formula = dateability ~ intelligence,
data = facebook)
summary(t7_aov)
TukeyHSD(t7_aov)
# Answer: There was not a significant main effect of intelligence on dateability (F(2, 497) = 1.59, p = 0.21).
t8_aov <- aov(formula = dateability ~ haircolor,
data = facebook)
summary(t8_aov)
# *Answer: There was no significant main effect of haircolor on dateability (F(1, 498) = 0.08, p = 0.77).*
t9_aov <- aov(formula = dateability ~ intelligence + university + haircolor,
data = facebook)
summary(t9_aov)
# Answer: Yes the results are (generally) the same. Only the effect of University is significant at the 0.05 level.
t10_aov <- aov(formula = dateability ~ sex + haircolor + university + education + shirtless + intelligence + attractiveness,
data = facebook)
summary(t10_aov)
# *Answer: There were significant effects of sex (F(1, 487) = 60.75, p < .01), university (F(487) = 6.11, p < .01), intelligence (F(2, 487) = 3.21, p = .04) and attractiveness (F(2, 487) = 49.35, p < .01).*
pirateplot()
, barplot()
, boxplot()
) showing the distribution of dateability based on two independent variables: sex and shirtless. Based on what you see in the plot, do you expect there to be an interaction between sex and shirtless? Why or why not?library(yarrr)
pirateplot(dateability ~ sex + shirtless,
data = facebook)
# Yes there looks like an intereaction!
t12_aov <- aov(dateability ~ sex * shirtless,
data = facebook)
summary(t12_aov)
# There is a sigificant interaction between sex and shirtless F(1, 496) = 66.15, p < .01.
data
argument). Do your conclusions change compared to when you analyzed the data from all sessions?t13_aov <- aov(formula = dateability ~ intelligence + university + haircolor,
data = subset(facebook, session > 30))
summary(t13_aov)
# Now no effects are significant!
knitr::include_graphics("https://www.mariowiki.com/images/thumb/6/65/CheckpointSM3DL.png/115px-CheckpointSM3DL.png")
ggplot
or the yarrr::pirateplot()
function shown in the examples above) showing the distribution of dateability based on two independent variables: university and education. Based on what you see in the plot, do you expect there to be an interaction between university and education? Why or why not?pirateplot(dateability ~ university + education,
data = facebook)
# No there doesn't look like an interaction to me. The effect of university is the same in each level of education.
t15_aov <- aov(dateability ~ university * education,
data = facebook)
summary(t15_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## university 2 7602 3801 5.229 0.00566 **
## education 3 2335 778 1.071 0.36118
## university:education 6 2101 350 0.482 0.82215
## Residuals 488 354770 727
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Yep! No significant interaction between university and education F(6, 488) = 0.48, p = 0.82.
pirateplot(dateability ~ university + haircolor,
data = facebook)
# There does not look like an interaction
t17_aov <- aov(dateability ~ university * haircolor,
data = facebook)
summary(t17_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## university 2 7602 3801 5.242 0.00559 **
## haircolor 1 259 259 0.357 0.55070
## university:haircolor 2 719 359 0.496 0.60944
## Residuals 494 358228 725
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Yep! no significant interaction! F(2, 494) = 0.49, p = 0.61
t18_aov <- aov(dateability ~ university * haircolor,
data = subset(facebook, sex == "m" & age > 25))
summary(t18_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## university 2 250 125.0 0.275 0.762
## haircolor 1 1139 1138.6 2.503 0.128
## university:haircolor 2 2487 1243.6 2.734 0.087 .
## Residuals 22 10007 454.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Yep! Still no interaction, F(2, 22) = 2.73, p = .09
aov
object to visualize things like the model residuals. Try plotting the results of your anova from question 17 (e.g.; plot(t17_aov)
) and look at the resulting plots.plot(t18_aov)
# Cool plots!
apa_print()
function from the papaja
package to print apa style conclusions from aov
objects. The papaja
package is on GitHub (not on CRAN), so to install it you’ll need to use the install_github()
function from the devtools
package as follows:install.packages("devtools") # Only if you don't have the devtools package
devtools::install_github("crsh/papaja") # Install the papaja package from GitHub
library("papaja") # Load the papaja package
Now that you’ve got it, try evaluating apa_print()
on some of your previous aov
objects to see what happens. You may notice that the results have special characters like $
and \\
. This is because the output contains formatting code for LaTeX.
papaja::apa_print(t18_aov)
## $estimate
## $estimate$university
## [1] "$\\eta^2_G = .024$"
##
## $estimate$haircolor
## [1] "$\\eta^2_G = .102$"
##
## $estimate$university_haircolor
## [1] "$\\eta^2_G = .199$"
##
##
## $statistic
## $statistic$university
## [1] "$F(2, 22) = 0.27$, $\\mathrm{MSE} = 454.85$, $p = .762$"
##
## $statistic$haircolor
## [1] "$F(1, 22) = 2.50$, $\\mathrm{MSE} = 454.85$, $p = .128$"
##
## $statistic$university_haircolor
## [1] "$F(2, 22) = 2.73$, $\\mathrm{MSE} = 454.85$, $p = .087$"
##
##
## $full_result
## $full_result$university
## [1] "$F(2, 22) = 0.27$, $\\mathrm{MSE} = 454.85$, $p = .762$, $\\eta^2_G = .024$"
##
## $full_result$haircolor
## [1] "$F(1, 22) = 2.50$, $\\mathrm{MSE} = 454.85$, $p = .128$, $\\eta^2_G = .102$"
##
## $full_result$university_haircolor
## [1] "$F(2, 22) = 2.73$, $\\mathrm{MSE} = 454.85$, $p = .087$, $\\eta^2_G = .199$"
##
##
## $table
## Effect $F$ $\\mathit{df}_1$ $\\mathit{df}_2$
## 1 University 0.27 2 22
## 2 Haircolor 2.50 1 22
## 3 University $\\times$ Haircolor 2.73 2 22
## $\\mathrm{MSE}$ $p$ $\\eta^2_G$
## 1 454.85 .762 .024
## 2 454.85 .128 .102
## 3 454.85 .087 .199
predict()
function to use a model (like an ANOVA) to predict the values of new data using the notation predict(MODEL, newdata)
. Using your ANOVA from question 8, predict the dateability of the following 5 students:newdata <- data.frame("id" = c(1, 2, 3, 4, 5),
"haircolor" = c("brown", "brown", "blonde", "blonde", "brown"),
"university" = c("1.Basel", "1.Basel", "2.Zurich", "3.Geneva", "3.Geneva"),
stringsAsFactors = FALSE)
predict(t18_aov, newdata)
## 1 2 3 4 5
## 61.83333 61.83333 55.00000 37.33333 49.22222
aov()
function in R calculates what is known as a type I ANOVA. If your data are severely imbalanced, that is, where the number of observations in each group are not similar, then using a Type I ANOVA can lead to misleading results. In this case, it’s better to use a Type II or Type III ANOVA. The Anova()
function from the car
package allows you to conduct these types of ANOVAs. Here’s how to use the Anova()
function:install.packages("car") # Only if you don't have the car package yet
library("car") # Load the car package
# Is there a relationship between sex and college on tattoos?
# First, create an lm() object, the Anova() function needs this:
model_lm <- lm(formula = tattoos ~ sex + college,
data = pirates)
# Type II anova
Anova(model_lm,
type = "II") # Type II
# Type III anova
Anova(model_lm,
type = "III") # Type III
# Type I anova using the aov() function
summary(aov(model_lm))
# In this case, all three tests give virtually the same answers (thankfully)
Now, answer the question: “Is there an interaction between shirtless and sex on dating desireability?” by conducting three separate ANOVAS, one that is Type I (using aov()
), one that is Type II (using Anova()
), and one that is Type III (using Anova()
). Do you get the same or different answers? To learn more about how the different ANOVA types work, look at this post by Falk Scholer: http://goanna.cs.rmit.edu.au/~fscholer/anova.php.
library(car)
summary(aov(dateability ~ shirtless + sex, data = facebook))
## Df Sum Sq Mean Sq F value Pr(>F)
## shirtless 1 1551 1551 2.313 0.129
## sex 1 31951 31951 47.643 1.57e-11 ***
## Residuals 497 333305 671
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova(lm(formula = dateability ~ shirtless + sex,
data = facebook), type = "II")
## Anova Table (Type II tests)
##
## Response: dateability
## Sum Sq Df F value Pr(>F)
## shirtless 275 1 0.4099 0.5223
## sex 31951 1 47.6430 1.566e-11 ***
## Residuals 333305 497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova(lm(formula = dateability ~ shirtless + sex,
data = facebook), type = "III")
## Anova Table (Type III tests)
##
## Response: dateability
## Sum Sq Df F value Pr(>F)
## (Intercept) 906964 1 1352.3964 < 2.2e-16 ***
## shirtless 275 1 0.4099 0.5223
## sex 31951 1 47.6430 1.566e-11 ***
## Residuals 333305 497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# The p-value for the type I test (using aov()) is much smaller than the p-value for the type II and type III tests (which are virtually identical).
# Add standardized dateability score to facebook dataframe
facebook$dateability_z <- (facebook$dateability - mean(facebook$dateability)) / sd(facebook$dateability)
# Anova for original (non-standardized) variable
summary(aov(formula = dateability ~ attractiveness, data = facebook))
## Df Sum Sq Mean Sq F value Pr(>F)
## attractiveness 2 61237 30619 49.8 <2e-16 ***
## Residuals 497 305571 615
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Anova for non-standardized variable
summary(aov(formula = dateability_z ~ attractiveness, data = facebook))
## Df Sum Sq Mean Sq F value Pr(>F)
## attractiveness 2 83.3 41.65 49.8 <2e-16 ***
## Residuals 497 415.7 0.84
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# The results are identical! This means that you can always standardize your dependent variable and it won't affect the test.
wpa_7_LastFirst.R
file to me at nathaniel.phillips@unibas.ch.