Readings

This assignment is based on the following readings:

Assignment Goals

Examples

library(yarrr) # Load yarrr for the pirates dataframe

# ------------------
#  ONE WAY ANOVA
# ------------------

# Do pirates from different colleges have different beard lengths?

col_beard_aov <- aov(formula = beard.length ~ college,
                     data = pirates)

# What is in the object?
class(col_beard_aov)     # Result is of class 'aov' and 'lm'
names(col_beard_aov)     # All of the named elements in the object

# Look at specific elements of the object

col_beard_aov$coefficients   # Coeffients
col_beard_aov$residuals      # Residuals (should be normally distributed)

# Look at results
summary(col_beard_aov)   # We do find a significant effect of college on beard length!
TukeyHSD(col_beard_aov)  # Post-hoc tests

# ------------------
# TWO WAY ANOVA
# ------------------

# Is there a relationship between sex and headband on weight

sexhead_weight_aov <- aov(formula = weight ~ sex + headband,
                     data = pirates)

summary(sexhead_weight_aov)   # There is an effect of sex, but not headband
TukeyHSD(sexhead_weight_aov)  # Post-hoc tests

# ------------------
# TWO WAY ANOVA WITH INTERACTIONS
# ------------------

# Is there an interaction between sex and headband on weight

sexhead_int_weight_aov <- aov(formula = weight ~ sex * headband,   # Use * instead of + for interactions!
                              data = pirates)

summary(sexhead_int_weight_aov)   # Nope, no interaction


# ------------------
# More fun!
# ------------------

# Plot an ANOVA object to visualize several statistics
plot(col_beard_aov)

# Use the papaja package to print apa style results
devtools::install_github("crsh/papaja", include_vignettes = TRUE)   # Install the papaja package from github
library("papaja")   # Load the papaja package

# Print apa style conclusions from aov objects
apa_print(col_beard_aov)
apa_print(sexhead_int_weight_aov)

# Easily plot group effects with yarrr::pirateplot()

library(yarrr)
pirateplot(formula = weight ~ sex + headband, 
           data = pirates)

# Or use ggplot2
library(tidyverse) # Contains ggplot2 and dplyr

# First, calculate aggregate data to be plotted

pirates_agg <- pirates %>% 
  group_by(headband, sex) %>%
  summarise(
    weight_mean = mean(weight),
    weight_lb = t.test(weight)$conf.int[1],
    weight_ub = t.test(weight)$conf.int[2]
  )

ggplot(data = pirates_agg,
       aes(x = headband, y = weight_mean, fill = sex)) + 
  geom_bar(stat = "identity", position = position_dodge(0.9), col = "white") + 
  geom_errorbar(aes(ymax = weight_lb,
                    ymin = weight_ub), 
                position = position_dodge(0.9), 
                width = 0.25) + 
  labs(y = "Treasure chests found")

Facebook Attraction

In this WPA, you will analyze data from a (again…fake) study on attraction. In the study, 500 heterosexual University students viewed the Facebook profile of another student (the “target”) of the opposite sex. Based on a target’s profile, each participant made three judgments about the target - intelligence, attractiveness, and dateability. The primary judgement was a dateability rating indicating how dateable the person was on a scale of 0 to 100.

The data are located in a tab-delimited text file at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/facebook.txt. Here is how the first few rows of the data should look:

head(facebook)
##   session sex age haircolor university   education shirtless intelligence
## 1       1   m  22    blonde   3.Geneva 2.Bachelors      1.No     2.medium
## 2       1   m  24     brown   2.Zurich   3.Masters     2.Yes     2.medium
## 3       1   f  22     brown   3.Geneva 2.Bachelors     2.Yes     2.medium
## 4       1   f  23     brown    1.Basel   3.Masters      1.No        1.low
## 5       1   m  22    blonde    1.Basel 2.Bachelors      1.No       3.high
## 6       1   m  28     brown    1.Basel       4.PhD      1.No       3.high
##   attractiveness dateability
## 1          1.low          45
## 2         3.high          63
## 3       2.medium          58
## 4         3.high          86
## 5         3.high          96
## 6         3.high          90

Datafile description

The data file has 500 rows and 10 columns. Here are the columns

Data loading and preparation

  1. Open your class R project. This project should have (at least) two folders, one called data and one called R. Open a new script and enter your name, date, and the wpa number at the top. Save the script in the R folder in your project working directory as wpa_7_LastFirst.R, where Last and First are your last and first names.

  2. The data are stored in a tab–delimited text file located at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/facebook.txt. Using read.table() load this data into R as a new object called facebook

facebook <- read.table(file = "https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/facebook.txt",
                       header = TRUE, 
                       sep = "\t")

Understand the data

  1. Look at the first few rows of the dataframe with the head() and View() functions to make sure it loaded correctly.
head(facebook)
  1. Using the names() and str() functions, look at the names and structure of the dataframe to make sure everything looks ok. If the data look strange, you did something wrong with read.table() diagnose the problem!
str(facebook)
  1. Using write.table(), save a local copy of the facebook data to a text file called facebook.txt in the data folder of your project. Now, you’ll always have access to the data.
write.table(facebook, 
            file = "data/facebook.txt",
            sep = "\t")

Answer guidelines Read carefully to save yourself time!

For example, here is how I would analyze and answer the question: “Was there an effect of diets on Chicken Weights?”"

# ANOVA on Chicken Weights
#   IV = Diet, DV = weight

# ANOVA
t0_aov <- aov(formula = weight ~ Diet,
              data = ChickWeight)

# Look at summary results
summary(t0_aov)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## Diet          3  155863   51954   10.81 6.43e-07 ***
## Residuals   574 2758693    4806                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA was significant (p < .01), so I'll conduct post-hoc tests

# Tukey post-hoc tests
TukeyHSD(t0_aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = weight ~ Diet, data = ChickWeight)
## 
## $Diet
##          diff         lwr      upr     p adj
## 2-1 19.971212  -0.2998092 40.24223 0.0552271
## 3-1 40.304545  20.0335241 60.57557 0.0000025
## 4-1 32.617257  12.2353820 52.99913 0.0002501
## 3-2 20.333333  -2.7268370 43.39350 0.1058474
## 4-2 12.646045 -10.5116315 35.80372 0.4954239
## 4-3 -7.687288 -30.8449649 15.47039 0.8277810
# Conclusion
#  There was a significant main effect of diets on chicken weights (F(3, 574) = 10.81, p < .01). Pairwise Tukey HSD tests showed significant differences between diets 1 and 3 (diff = 40.30, p < .01) and diets 1 and 4 (diff = 32.62, p < .01). All other pairwise differences were not significant at the 0.05 significance threshold.

One-way ANOVAS

  1. Was there a main effect of the university on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests. Report full APA style conclusions
t6_aov <- aov(formula = dateability ~ university,
            data = facebook)

summary(t6_aov)

TukeyHSD(t6_aov)

#Answer: There was a significant main effect of university on dateability (F(2, 497) = 5.26, p < .01). Pairwise Tukey HSD tests showed significant differences between Geneva and Basel (diff = -9.23, p < .01). All other pairwise differences were not significant at the 0.05 significance threshold.*
  1. Was there a main effect of intelligence on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests. Report full APA style conclusions
t7_aov <- aov(formula = dateability ~ intelligence,
              data = facebook)

summary(t7_aov)

TukeyHSD(t7_aov)

# Answer: There was not a significant main effect of intelligence on dateability (F(2, 497) = 1.59, p = 0.21).
  1. Was there a main effect of haircolor on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests. Report full APA style conclusions
t8_aov <- aov(formula = dateability ~ haircolor,
              data = facebook)

summary(t8_aov)

# *Answer: There was no significant main effect of haircolor on dateability (F(1, 498) = 0.08, p = 0.77).*

Multi way, independent ANOVAs

  1. Conduct a three-way ANOVA on dateability with both intelligence, university and haircolor as IVs. Do your results for each variable change compared to your previous one-way ANOVAs on these variables? (You do not need to give APA results or conduct post-hoc tests, just answer the question verbally).
t9_aov <- aov(formula = dateability ~ intelligence + university + haircolor,
              data = facebook)

summary(t9_aov)

# Answer: Yes the results are (generally) the same. Only the effect of University is significant at the 0.05 level.
  1. Conduct a multi-way anova including sex, haircolor, university, education, shirtless, intelligence and attractiveness as independent variables predicting dateability. Which variables are significantly related to dateability? Do write APA results for each variable but do not conduct post-hoc tests.
t10_aov <- aov(formula = dateability ~ sex + haircolor + university + education + shirtless + intelligence + attractiveness,
               data = facebook)

summary(t10_aov)

# *Answer: There were significant effects of sex (F(1, 487) = 60.75, p < .01), university (F(487) = 6.11, p < .01), intelligence (F(2, 487) = 3.21, p = .04) and attractiveness (F(2, 487) = 49.35, p < .01).*

Interactions

  1. Create a plot (e.g.; pirateplot(), barplot(), boxplot()) showing the distribution of dateability based on two independent variables: sex and shirtless. Based on what you see in the plot, do you expect there to be an interaction between sex and shirtless? Why or why not?
library(yarrr)
pirateplot(dateability ~ sex + shirtless, 
           data = facebook)

# Yes there looks like an intereaction!
  1. Test your prediction with the appropriate ANOVA. Report full APA style conclusions
t12_aov <- aov(dateability ~ sex * shirtless, 
               data = facebook)
            
summary(t12_aov)

# There is a sigificant interaction between sex and shirtless F(1, 496) = 66.15, p < .01.

ANOVAs on subsets of data

  1. It turns out that the experimenter who ran sessions 1 through 30 (a man) was trying to score a date and slipped in his own profile picture into the study. We can’t trust these data. Repeat your multi anova from question 9 ONLY for sessions larger than 30 (Hint: Make the changes to the data argument). Do your conclusions change compared to when you analyzed the data from all sessions?
t13_aov <- aov(formula = dateability ~ intelligence + university + haircolor,
               data = subset(facebook, session > 30))

summary(t13_aov)

# Now no effects are significant!

CHECKPOINT!

knitr::include_graphics("https://www.mariowiki.com/images/thumb/6/65/CheckpointSM3DL.png/115px-CheckpointSM3DL.png")

More interactions

  1. Create a plot (e.g.; using ggplot or the yarrr::pirateplot() function shown in the examples above) showing the distribution of dateability based on two independent variables: university and education. Based on what you see in the plot, do you expect there to be an interaction between university and education? Why or why not?
pirateplot(dateability ~ university + education, 
           data = facebook)

# No there doesn't look like an interaction to me. The effect of university is the same in each level of education.
  1. Test your prediction with the appropriate ANOVA. Report full APA style conclusions
t15_aov <- aov(dateability ~ university * education, 
               data = facebook)

summary(t15_aov)
##                       Df Sum Sq Mean Sq F value  Pr(>F)   
## university             2   7602    3801   5.229 0.00566 **
## education              3   2335     778   1.071 0.36118   
## university:education   6   2101     350   0.482 0.82215   
## Residuals            488 354770     727                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Yep! No significant interaction between university and education F(6, 488) = 0.48, p = 0.82.
  1. Create a plot showing the distribution of dateability based on two independent variables: university and haircolor. Based on what you see in the plot, do you expect there to be an interaction between university and intelligence? Why or why not?
pirateplot(dateability ~ university + haircolor, 
           data = facebook)

# There does not look like an interaction
  1. Test your prediction with the appropriate ANOVA. Report full APA style conclusions
t17_aov <- aov(dateability ~ university * haircolor, 
               data = facebook)

summary(t17_aov)
##                       Df Sum Sq Mean Sq F value  Pr(>F)   
## university             2   7602    3801   5.242 0.00559 **
## haircolor              1    259     259   0.357 0.55070   
## university:haircolor   2    719     359   0.496 0.60944   
## Residuals            494 358228     725                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Yep! no significant interaction! F(2, 494) = 0.49, p = 0.61
  1. Repeat the test from the previous question, but only include males over the age of 25. Do you get the same answer?.
t18_aov <- aov(dateability ~ university * haircolor, 
               data = subset(facebook, sex == "m" & age > 25))

summary(t18_aov)
##                      Df Sum Sq Mean Sq F value Pr(>F)  
## university            2    250   125.0   0.275  0.762  
## haircolor             1   1139  1138.6   2.503  0.128  
## university:haircolor  2   2487  1243.6   2.734  0.087 .
## Residuals            22  10007   454.9                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Yep! Still no interaction, F(2, 22) = 2.73, p = .09

Learn more

  1. You can print an aov object to visualize things like the model residuals. Try plotting the results of your anova from question 17 (e.g.; plot(t17_aov)) and look at the resulting plots.
plot(t18_aov)
# Cool plots!
  1. You can use the apa_print() function from the papaja package to print apa style conclusions from aov objects. The papaja package is on GitHub (not on CRAN), so to install it you’ll need to use the install_github() function from the devtools package as follows:
install.packages("devtools")            # Only if you don't have the devtools package
devtools::install_github("crsh/papaja") # Install the papaja package from GitHub
library("papaja")                       # Load the papaja package

Now that you’ve got it, try evaluating apa_print() on some of your previous aov objects to see what happens. You may notice that the results have special characters like $ and \\. This is because the output contains formatting code for LaTeX.

papaja::apa_print(t18_aov)
## $estimate
## $estimate$university
## [1] "$\\eta^2_G = .024$"
## 
## $estimate$haircolor
## [1] "$\\eta^2_G = .102$"
## 
## $estimate$university_haircolor
## [1] "$\\eta^2_G = .199$"
## 
## 
## $statistic
## $statistic$university
## [1] "$F(2, 22) = 0.27$, $\\mathrm{MSE} = 454.85$, $p = .762$"
## 
## $statistic$haircolor
## [1] "$F(1, 22) = 2.50$, $\\mathrm{MSE} = 454.85$, $p = .128$"
## 
## $statistic$university_haircolor
## [1] "$F(2, 22) = 2.73$, $\\mathrm{MSE} = 454.85$, $p = .087$"
## 
## 
## $full_result
## $full_result$university
## [1] "$F(2, 22) = 0.27$, $\\mathrm{MSE} = 454.85$, $p = .762$, $\\eta^2_G = .024$"
## 
## $full_result$haircolor
## [1] "$F(1, 22) = 2.50$, $\\mathrm{MSE} = 454.85$, $p = .128$, $\\eta^2_G = .102$"
## 
## $full_result$university_haircolor
## [1] "$F(2, 22) = 2.73$, $\\mathrm{MSE} = 454.85$, $p = .087$, $\\eta^2_G = .199$"
## 
## 
## $table
##                           Effect  $F$ $\\mathit{df}_1$ $\\mathit{df}_2$
## 1                     University 0.27                2               22
## 2                      Haircolor 2.50                1               22
## 3 University $\\times$ Haircolor 2.73                2               22
##   $\\mathrm{MSE}$  $p$ $\\eta^2_G$
## 1          454.85 .762        .024
## 2          454.85 .128        .102
## 3          454.85 .087        .199
  1. You can use the predict() function to use a model (like an ANOVA) to predict the values of new data using the notation predict(MODEL, newdata). Using your ANOVA from question 8, predict the dateability of the following 5 students:
newdata <- data.frame("id" = c(1, 2, 3, 4, 5),
                      "haircolor" = c("brown", "brown", "blonde", "blonde", "brown"),
                      "university" = c("1.Basel", "1.Basel", "2.Zurich", "3.Geneva", "3.Geneva"),
                      stringsAsFactors = FALSE)
predict(t18_aov, newdata)
##        1        2        3        4        5 
## 61.83333 61.83333 55.00000 37.33333 49.22222
  1. There are different “types” of ANOVAS. The aov() function in R calculates what is known as a type I ANOVA. If your data are severely imbalanced, that is, where the number of observations in each group are not similar, then using a Type I ANOVA can lead to misleading results. In this case, it’s better to use a Type II or Type III ANOVA. The Anova() function from the car package allows you to conduct these types of ANOVAs. Here’s how to use the Anova() function:
install.packages("car")  # Only if you don't have the car package yet
library("car")           # Load the car package

# Is there a relationship between sex and college on tattoos?

# First, create an lm() object, the Anova() function needs this:

model_lm <- lm(formula = tattoos ~ sex + college, 
         data = pirates)

# Type II anova

Anova(model_lm, 
      type = "II")   # Type II

# Type III anova

Anova(model_lm, 
      type = "III")  # Type III

# Type I anova using the aov() function

summary(aov(model_lm))

# In this case, all three tests give virtually the same answers (thankfully)

Now, answer the question: “Is there an interaction between shirtless and sex on dating desireability?” by conducting three separate ANOVAS, one that is Type I (using aov()), one that is Type II (using Anova()), and one that is Type III (using Anova()). Do you get the same or different answers? To learn more about how the different ANOVA types work, look at this post by Falk Scholer: http://goanna.cs.rmit.edu.au/~fscholer/anova.php.

library(car)

summary(aov(dateability ~ shirtless + sex, data = facebook))
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## shirtless     1   1551    1551   2.313    0.129    
## sex           1  31951   31951  47.643 1.57e-11 ***
## Residuals   497 333305     671                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova(lm(formula = dateability ~ shirtless + sex, 
           data = facebook), type = "II")
## Anova Table (Type II tests)
## 
## Response: dateability
##           Sum Sq  Df F value    Pr(>F)    
## shirtless    275   1  0.4099    0.5223    
## sex        31951   1 47.6430 1.566e-11 ***
## Residuals 333305 497                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova(lm(formula = dateability ~ shirtless + sex, 
           data = facebook), type = "III")
## Anova Table (Type III tests)
## 
## Response: dateability
##             Sum Sq  Df   F value    Pr(>F)    
## (Intercept) 906964   1 1352.3964 < 2.2e-16 ***
## shirtless      275   1    0.4099    0.5223    
## sex          31951   1   47.6430 1.566e-11 ***
## Residuals   333305 497                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# The p-value for the type I test (using aov()) is much smaller than the p-value for the type II and type III tests (which are virtually identical).
  1. Does it affect an ANOVA to standardize the dependent variable? Test this by conducting an ANOVA testing the effect of attractiveness on dateability. Do this once using the raw dateability scores, and once again after standardizing the dateability scores (Hint: To standardize a variable, subtract its mean then divide by its standard deviation). Do you get the same or different results?
# Add standardized dateability score to facebook dataframe
facebook$dateability_z <- (facebook$dateability - mean(facebook$dateability)) / sd(facebook$dateability)

# Anova for original (non-standardized) variable
summary(aov(formula = dateability ~ attractiveness, data = facebook))
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## attractiveness   2  61237   30619    49.8 <2e-16 ***
## Residuals      497 305571     615                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Anova for non-standardized variable
summary(aov(formula = dateability_z ~ attractiveness, data = facebook))
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## attractiveness   2   83.3   41.65    49.8 <2e-16 ***
## Residuals      497  415.7    0.84                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# The results are identical! This means that you can always standardize your dependent variable and it won't affect the test.

Submit!