aov()
TukeyHSD()
library(yarrr) # Load yarrr for the pirates dataframe
# ------------------
# ONE WAY ANOVA
# ------------------
# Do pirates from different colleges have different beard lengths?
col_beard_aov <- aov(formula = beard.length ~ college,
data = pirates)
# What is in the object?
class(col_beard_aov) # Result is of class 'aov' and 'lm'
names(col_beard_aov) # All of the named elements in the object
# Look at specific elements of the object
col_beard_aov$coefficients # Coeffients
col_beard_aov$residuals # Residuals (should be normally distributed)
# Look at results
summary(col_beard_aov) # We do find a significant effect of college on beard length!
TukeyHSD(col_beard_aov) # Post-hoc tests
# ------------------
# TWO WAY ANOVA
# ------------------
# Is there a relationship between sex and headband on weight
sexhead_weight_aov <- aov(formula = weight ~ sex + headband,
data = pirates)
summary(sexhead_weight_aov) # There is an effect of sex, but not headband
TukeyHSD(sexhead_weight_aov) # Post-hoc tests
# ------------------
# TWO WAY ANOVA WITH INTERACTIONS
# ------------------
# Is there an interaction between sex and headband on weight
sexhead_int_weight_aov <- aov(formula = weight ~ sex * headband, # Use * instead of + for interactions!
data = pirates)
summary(sexhead_int_weight_aov) # Nope, no interaction
# ------------------
# More fun!
# ------------------
# Plot an ANOVA object to visualize several statistics
plot(col_beard_aov)
# Use the papaja package to print apa style results
devtools::install_github("crsh/papaja", include_vignettes = TRUE) # Install the papaja package from github
library("papaja") # Load the papaja package
# Print apa style conclusions from aov objects
apa_print(col_beard_aov)
apa_print(sexhead_int_weight_aov)
# Easily plot group effects with yarrr::pirateplot()
library(yarrr)
pirateplot(formula = weight ~ sex + headband,
data = pirates)
# Or use ggplot2
library(tidyverse) # Contains ggplot2 and dplyr
# First, calculate aggregate data to be plotted
pirates_agg <- pirates %>%
group_by(headband, sex) %>%
summarise(
weight_mean = mean(weight),
weight_lb = t.test(weight)$conf.int[1],
weight_ub = t.test(weight)$conf.int[2]
)
ggplot(data = pirates_agg,
aes(x = headband, y = weight_mean, fill = sex)) +
geom_bar(stat = "identity", position = position_dodge(0.9), col = "white") +
geom_errorbar(aes(ymax = weight_lb,
ymin = weight_ub),
position = position_dodge(0.9),
width = 0.25) +
labs(y = "Treasure chests found")
In this WPA, you will analyze data from a (again…fake) study on attraction. In the study, 500 heterosexual University students viewed the Facebook profile of another student (the “target”) of the opposite sex. Based on a target’s profile, each participant made three judgments about the target - intelligence, attractiveness, and dateability. The primary judgement was a dateability rating indicating how dateable the person was on a scale of 0 to 100.
The data are located in a tab-delimited text file at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/facebook.txt. Here is how the first few rows of the data should look:
## session sex age haircolor university education shirtless intelligence
## 1 1 m 22 blonde 3.Geneva 2.Bachelors 1.No 2.medium
## 2 1 m 24 brown 2.Zurich 3.Masters 2.Yes 2.medium
## 3 1 f 22 brown 3.Geneva 2.Bachelors 2.Yes 2.medium
## 4 1 f 23 brown 1.Basel 3.Masters 1.No 1.low
## 5 1 m 22 blonde 1.Basel 2.Bachelors 1.No 3.high
## 6 1 m 28 brown 1.Basel 4.PhD 1.No 3.high
## attractiveness dateability
## 1 1.low 45
## 2 3.high 63
## 3 2.medium 58
## 4 3.high 86
## 5 3.high 96
## 6 3.high 90
The data file has 500 rows and 10 columns. Here are the columns
session
: The experiment session in which the study was run. There were 50 total sessions.
sex
: The sex of the target
age
: The age of the target
haircolor
: The haircolor of the target
university
: The university that the target attended.
education
: The highest level of education obtained by the target.
shirtless
: Did the target have a shirtless profile picture? 1.No v 2.Yes
intelligence
: How intelligent do you find this target? 1.Low, 2.Medium, 3.High
attractiveness
: How physically attractive do you find this target? 1.Low, 2.Medium, 3.High
dateability
: How dateable is this target? 0 to 100.
Open your class R project. This project should have (at least) two folders, one called data
and one called R
. Open a new script and enter your name, date, and the wpa number at the top. Save the script in the R
folder in your project working directory as wpa_7_LastFirst.R
, where Last and First are your last and first names.
The data are stored in a tab–delimited text file located at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/facebook.txt. Using read.table()
load this data into R as a new object called facebook
Look at the first few rows of the dataframe with the head()
and View()
functions to make sure it loaded correctly.
Using the names()
and str()
functions, look at the names and structure of the dataframe to make sure everything looks ok. If the data look strange, you did something wrong with read.table()
diagnose the problem!
Using write.table()
, save a local copy of the facebook data to a text file called facebook.txt
in the data folder of your project. Now, you’ll always have access to the data.
For each question, conduct the appropriate ANOVA by creating an object called tX_aov
, where X is the task number. Look at the results using summary()
. Then, write the conclusion in APA style. To summarize an effect in an ANOVA, use the format F(XXX, YYY) = FFF, p = PPP, where XXX is the degrees of freedom of the variable you are testing, YYY is the degrees of freedom of the residuals, FFF is the F value for the variable you are testing, and PPP is the p-value (if the p-value is less than .01, just write p < .01).
If the p-value of the ANOVA is less than .05, conduct post-hoc tests. If you are only testing one independent variable, write APA conclusions for the post-hoc test. If you are testing more than one independent variable in your ANOVA, you do not need to write APA style conclusions for post-hoc tests.
For example, here is how I would analyze and answer the question: “Was there an effect of diets on Chicken Weights?”"
# ANOVA on Chicken Weights
# IV = Diet, DV = weight
# ANOVA
t0_aov <- aov(formula = weight ~ Diet,
data = ChickWeight)
# Look at summary results
summary(t0_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 3 155863 51954 10.81 6.43e-07 ***
## Residuals 574 2758693 4806
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA was significant (p < .01), so I'll conduct post-hoc tests
# Tukey post-hoc tests
TukeyHSD(t0_aov)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ Diet, data = ChickWeight)
##
## $Diet
## diff lwr upr p adj
## 2-1 19.971212 -0.2998092 40.24223 0.0552271
## 3-1 40.304545 20.0335241 60.57557 0.0000025
## 4-1 32.617257 12.2353820 52.99913 0.0002501
## 3-2 20.333333 -2.7268370 43.39350 0.1058474
## 4-2 12.646045 -10.5116315 35.80372 0.4954239
## 4-3 -7.687288 -30.8449649 15.47039 0.8277810
# Conclusion
# There was a significant main effect of diets on chicken weights (F(3, 574) = 10.81, p < .01). Pairwise Tukey HSD tests showed significant differences between diets 1 and 3 (diff = 40.30, p < .01) and diets 1 and 4 (diff = 32.62, p < .01). All other pairwise differences were not significant at the 0.05 significance threshold.
Was there a main effect of the university on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests. Report full APA style conclusions
Was there a main effect of intelligence on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests. Report full APA style conclusions
Was there a main effect of haircolor on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests. Report full APA style conclusions
Conduct a three-way ANOVA on dateability with both intelligence, university and haircolor as IVs. Do your results for each variable change compared to your previous one-way ANOVAs on these variables? (You do not need to give APA results or conduct post-hoc tests, just answer the question verbally).
Conduct a multi-way anova including sex, haircolor, university, education, shirtless, intelligence and attractiveness as independent variables predicting dateability. Which variables are significantly related to dateability? Do write APA results for each variable but do not conduct post-hoc tests.
Create a plot (e.g.; pirateplot()
, barplot()
, boxplot()
) showing the distribution of dateability based on two independent variables: sex and shirtless. Based on what you see in the plot, do you expect there to be an interaction between sex and shirtless? Why or why not?
Test your prediction with the appropriate ANOVA. Report full APA style conclusions
data
argument). Do your conclusions change compared to when you analyzed the data from all sessions?Create a plot (e.g.; using ggplot
or the yarrr::pirateplot()
function shown in the examples above) showing the distribution of dateability based on two independent variables: university and education. Based on what you see in the plot, do you expect there to be an interaction between university and education? Why or why not?
Test your prediction with the appropriate ANOVA. Report full APA style conclusions
Create a plot showing the distribution of dateability based on two independent variables: university and haircolor. Based on what you see in the plot, do you expect there to be an interaction between university and intelligence? Why or why not?
Test your prediction with the appropriate ANOVA. Report full APA style conclusions
Repeat the test from the previous question, but only include males over the age of 25. Do you get the same answer?.
You can print an aov
object to visualize things like the model residuals. Try plotting the results of your anova from question 17 (e.g.; plot(t17_aov)
) and look at the resulting plots.
You can use the apa_print()
function from the papaja
package to print apa style conclusions from aov
objects. The papaja
package is on GitHub (not on CRAN), so to install it you’ll need to use the install_github()
function from the devtools
package as follows:
install.packages("devtools") # Only if you don't have the devtools package
devtools::install_github("crsh/papaja") # Install the papaja package from GitHub
library("papaja") # Load the papaja package
Now that you’ve got it, try evaluating apa_print()
on some of your previous aov
objects to see what happens. You may notice that the results have special characters like $
and \\
. This is because the output contains formatting code for LaTeX.
predict()
function to use a model (like an ANOVA) to predict the values of new data using the notation predict(MODEL, newdata)
. Using your ANOVA from question 8, predict the dateability of the following 5 students:newdata <- data.frame("id" = c(1, 2, 3, 4, 5),
"haircolor" = c("brown", "brown", "blonde", "blonde", "brown"),
"university" = c("1.Basel", "1.Basel", "2.Zurich", "3.Geneva", "3.Geneva"),
stringsAsFactors = FALSE)
aov()
function in R calculates what is known as a type I ANOVA. If your data are severely imbalanced, that is, where the number of observations in each group are not similar, then using a Type I ANOVA can lead to misleading results. In this case, it’s better to use a Type II or Type III ANOVA. The Anova()
function from the car
package allows you to conduct these types of ANOVAs. Here’s how to use the Anova()
function:install.packages("car") # Only if you don't have the car package yet
library("car") # Load the car package
# Is there a relationship between sex and college on tattoos?
# First, create an lm() object, the Anova() function needs this:
model_lm <- lm(formula = tattoos ~ sex + college,
data = pirates)
# Type II anova
Anova(model_lm,
type = "II") # Type II
# Type III anova
Anova(model_lm,
type = "III") # Type III
# Type I anova using the aov() function
summary(aov(model_lm))
# In this case, all three tests give virtually the same answers (thankfully)
Now, answer the question: “Is there an interaction between shirtless and sex on dating desireability?” by conducting three separate ANOVAS, one that is Type I (using aov()
), one that is Type II (using Anova()
), and one that is Type III (using Anova()
). Do you get the same or different answers? To learn more about how the different ANOVA types work, look at this post by Falk Scholer: http://goanna.cs.rmit.edu.au/~fscholer/anova.php.
wpa_7_LastFirst.R
file to me at nathaniel.phillips@unibas.ch.