Readings

This assignment is based on the following readings:

Assignment Goals

Cheat Sheet

Before you start anything else, open the ggplot2 cheatsheet here in a new window: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf. You should keep this handy, and maybe even print one out for yourself, as it will help you to quickly find code to customise your plots.

Examples

# Install packages 
#
# install.packages('tidyverse')
# install.packages('yarrr')
# install.packages('jtools')

library(tidyverse)   # Load tidyverse packages
library(yarrr)       # Load yarrr package (for pirates data)
library(jtools)      # Load jtools (for theme_apa())

# Set theme (try running one of these themes and see how the following plots change)

# theme_set(theme_apa())
# theme_set(theme_bw())
# theme_set(theme_minimal())

# Histogram

## Default
ggplot(data = pirates,
        aes(x = height)) + 
        geom_histogram()

## With customisations
ggplot(data = pirates,
        aes(x = height)) + 
        geom_histogram(colour = "black", 
                       fill = "white", 
                       bins = 15) + 
  labs(x = "Height (cm)",
       y = "Frequency") + 
  scale_x_continuous(breaks = seq(100, 250, 10),  # Set custom x axis
                     limits = c(120, 250)) +     
  scale_y_continuous(breaks = seq(0, 300, 50),    # Set custom 7 axis
                     limits = c(0, 300)) +
   geom_vline(xintercept = mean(pirates$height),     # Add vertical line at mean
              col = "red") + 
  geom_text(mapping = aes(x = mean(pirates$height), 
                          y = 300, 
                          label = "Mean"), 
            nudge_x = 7)  # Move text a bit to the right
  

# Continuous IV, Continuous DV
# Scatterplot

ggplot(data = pirates,
       aes(x = height, y = weight)) +
  geom_point()

# With customisations

ggplot(data = pirates,
       aes(x = height, y = weight, col = sex)) +
  geom_point(alpha = .2) + 
  geom_smooth(method = "lm", colour = "blue") +
  theme(panel.grid.major = element_line(colour = gray(.9)))

# More customizations

ggplot(data = pirates,
       aes(x = height, y = weight, col = sex)) +
  geom_point(alpha = .2) + 
  geom_smooth(method = "lm", colour = "black") +
  theme(panel.grid.major = element_line(colour = gray(.9))) +
  facet_wrap(~sex) +
  guides(col = FALSE)

# Discrete IV, Continuous DV

### Violin plot
ggplot(data = pirates,
       aes(x = factor(fav.pixar), y = tchests)) + 
         geom_violin() + 
  labs(x = "Favorite Pixar Movie",
       y = "Treasure Chests")

### Boxplot
ggplot(data = pirates,
       aes(x = sword.type, y = tchests, fill = sword.type)) + 
         geom_boxplot() + 
  labs(x = "Favorite Sword",
       y = "Treasure chests") +
  guides(fill = FALSE) # Turn off legend for filling  

### Barplot

# First, calculate aggregate data to be plotted

pirates_agg <- pirates %>% 
  group_by(headband, sex) %>%
  summarise(
    tchests_mean = mean(tchests),
    tchests_lb = t.test(tchests)$conf.int[1],
    tchests_ub = t.test(tchests)$conf.int[2]
  )

## Simple

ggplot(data = pirates_agg,
       aes(x = headband, y = tchests_mean)) + 
  geom_bar(stat = "identity") + 
  labs(y = "Mean treasure chests found")


# Grouped barplot with error bars

ggplot(data = pirates_agg,
       aes(x = headband, y = tchests_mean, fill = sex)) + 
  geom_bar(stat = "identity", position = position_dodge(0.9), col = "white") + 
  geom_errorbar(aes(ymax = tchests_lb,
                    ymin = tchests_ub), 
                position = position_dodge(0.9), 
                width = 0.25) + 
  labs(y = "Treasure chests found")

It’s personal: : The effect of personal value on utilitarian moral judgments

In this WPA, we will analyze data from Millar et al. (2016): It’s personal: : The effect of personal value on utilitarian moral judgments.

Here is the abstract (You can find the full paper at http://journal.sjdm.org/16/16428/jdm16428.pdf):

We investigated whether the personal importance of objects influences utilitarian decision-making in which damaging property is necessary to produce an overall positive outcome. In Experiment 1, participants judged saving five objects by destroying a sixth object to be less acceptable when the action required destroying the sixth object directly (rather than as a side-effect) and the objects were personally important (rather than unimportant). In Experiment 2, we demonstrated that utilitarian judgments were not influenced by the objects’ monetary worth. Together these findings suggest that personal importance underlies people’s sensitivity to damaging property as a means for utilitarian gains.

Data

The original data are stored as csv viles at sjdm.org. However, the data needed some cleaning. The cleaned versions contain the original data, but with better labels and some minor corrections.

Study 1

Variable Description
acceptability How acceptable is the action?
important Were the objects important to the owner or not?
direct Was the destruction of an object a means of saving the others or a side-effect?
cover Was the object a poster or a clock?
gender Participant gender
age Participant Age
storycomp Comprehension question 1
itemcomp Comprehension question 2
ownercomp Comprehension question 3
failed Did participant fail an attention check?

Study 2

Variable Description
acceptability How acceptable is the action?
important Were the objects important to the owner or not?
direct Was the destruction of an object a means of saving the others or a side-effect?
expensive Was the object expensive or not?
previoustrolley Did participants complete a trolley problem in the past?
gender Participant gender
age Participant Age
topiccomp Comprehension question 1
expensivecomp Comprehension question 2
importancecomp Comprehension question 3
failed Did participant fail an attention check?
  1. For this assignment, you’ll need both the yarrr package and the tidyverse packages. The tidyverse package is actually a collection of many packages, includeing ggplot2 and dplyr. Install them (if you don’t have them already) and then load the packages using the following code:

  2. Open your R project from last week (I recommended calling it RCourse or something similar). There should be at least two folders in this working directory: data and R.

  3. Open a new R script and save it as wpa_5_LastFirst.R in the R folder in your project directory

  4. The data are stored in two separate .csv files. Study 1 is at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/Millar_study1.txt and Study 2 is at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/Millar_study2.txt. Load the data into R with the following code:

study1 <- read.table(file = "https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/Millar_study1.txt", 
                     sep = "\t",
                     header = TRUE, 
                     stringsAsFactors = FALSE)

study2 <- read.table(file = "https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/Millar_study2.txt", 
                     sep = "\t",
                     header = TRUE, 
                     stringsAsFactors = FALSE)
  1. Look at the structure of each data frame using str() and head() to make sure they were loaded correctly.

Histograms

  1. Create a histogram of the acceptability scores from study 1 using the following template. Add appropriate labels and colors as you see fit!
ggplot(data = __,
       aes(x = __)) + 
  geom_histogram(bins = __, 
                 col = "__")) +
  scale_x_continuous(limits = c(__, __))
ggplot(data = study1,
       aes(x = acceptability)) + 
  geom_histogram(bins = 10, col = "white") +
  scale_x_continuous(limits = c(0, 10))

  1. Now do the same for study 2. And for this plot, add a vertical line at the mean of the distribution with geom_vline (look at http://www.sthda.com/english/wiki/ggplot2-line-types-how-to-change-line-types-of-a-graph-in-r-software for tips on creating different line types)
ggplot(data = __,
       aes(x = __)) + 
  geom_histogram(bins = 10, col = "__", fill = "__") +
  scale_x_continuous(limits = c(__, __)) +
  geom_vline(xintercept = mean(study2$acceptability, na.rm = TRUE), 
             linetype = "dashed")
ggplot(data = study2,
       aes(x = acceptability)) + 
  geom_histogram(bins = 10, col = "black", fill = "white") +
  scale_x_continuous(limits = c(0, 10)) +
  geom_vline(xintercept = mean(study2$acceptability, na.rm = TRUE), 
             linetype = "dashed")

Scatterplots

  1. Create a scatterplot showing the relationship between age and acceptability score in study 1.
ggplot(data = __,
       aes(x = __, y = __)) + 
  geom_point()
ggplot(data = study1,
       aes(x = age, y = acceptability)) + 
  geom_point()

  1. Now create the following plot using geom_count() instead of geom_point():
ggplot(data = __,
       mapping = aes(x = __, y = __)) + 
  geom_count()
ggplot(data = study1,
       mapping = aes(x = age, y = acceptability)) + 
  geom_count()

  1. Now add a regression line with geom_smooth(method = 'lm')
ggplot(data = __,
       mapping = aes(x = __, y = __)) + 
  geom_count() + 
  geom_smooth(method = "__")
ggplot(data = study1,
       mapping = aes(x = age, y = acceptability)) +
  geom_count() + 
  geom_smooth(method = 'lm')

  1. Now add different colors for different genders by including col = factor(gender) in the aesthetic mapping:
ggplot(data = subset(study1, gender %in% c(1, 2)),
       mapping = aes(x = __, 
                     y = __, 
                     col = factor(__))) +
  geom_count() + 
  geom_smooth(method = '__', se = FALSE) +
  scale_colour_discrete(name = "__")
ggplot(data = subset(study1, gender %in% c(1, 2)),
       mapping = aes(x = age, y = acceptability, col = factor(gender))) +
  geom_count() + geom_smooth(method = 'lm', se = FALSE) +
  scale_colour_discrete(name = "Gender")

Barplot

  1. In this question you’ll create a barplot. But first, we need to aggregate some data. Do this by running the following code:
study1_agg_1 <- study1 %>%
  filter(complete.cases(study1)) %>%   # Only inclue rows without NAs
  group_by(important) %>%              # Group by important
  summarise(
    acceptability_mean = mean(acceptability, na.rm = TRUE),   # Mean
    acceptability_lb = t.test(acceptability)$conf.int[1],     # CI lower bound
    acceptability_ub = t.test(acceptability)$conf.int[2]      # CI upper bound
  )

Ok now we’re ready! Create the following barplot using this template:

ggplot(data = study1_agg_1,
       aes(x = factor(__), y = __)) + 
  geom_bar(stat = "identity", position = position_dodge(.9), col = "white") + 
  geom_errorbar(aes(ymax = __,
                    ymin = __), 
                position = position_dodge(.9), 
                width = 0.25) + 
  labs(x = "__", 
       y = "__")
ggplot(data = study1_agg_1,
       aes(x = factor(important), y = acceptability_mean)) + 
  geom_bar(stat = "identity", position = position_dodge(.9), col = "white") + 
  geom_errorbar(aes(ymax = acceptability_ub,
                    ymin = acceptability_lb), 
                position = position_dodge(.9), 
                width = 0.25) + 
  labs(x = "Important", 
       y = "Acceptability")

  1. Now we’ll make a barplot with two independent variables. First, we need to aggregate the data. Do this by running the following code:
study1_agg_2 <- study1 %>%
  filter(complete.cases(study1)) %>%   # Only inclue rows without NAs
  group_by(important, direct) %>%      # Group by important AND direct
  summarise(
    acceptability_mean = mean(acceptability, na.rm = TRUE),   # Mean
    acceptability_lb = t.test(acceptability)$conf.int[1],     # CI lower bound
    acceptability_ub = t.test(acceptability)$conf.int[2]      # CI upper bound
  )

Ok now we’re ready! Create the following barplot using this template:

ggplot(data = study1_agg_2,
       aes(x = factor(__), y = __, fill = factor(__))) + 
  geom_bar(stat = "identity", position = position_dodge(.9), col = "__") + 
  geom_errorbar(aes(ymax = __,
                    ymin = __), 
                position = position_dodge(.9), 
                width = 0.25) + 
  labs(x = "__", 
       y = "__") +
  scale_fill_discrete(name = "__")
ggplot(data = study1_agg_2,
       aes(x = factor(important), y = acceptability_mean, fill = factor(direct))) + 
  geom_bar(stat = "identity", position = position_dodge(.9), col = "white") + 
  geom_errorbar(aes(ymax = acceptability_lb,
                    ymin = acceptability_ub), 
                position = position_dodge(.9), 
                width = 0.25) + 
  labs(x = "Important", 
       y = "Acceptability") +
  scale_fill_discrete(name = "Direct")

Checkpoint!!!

  1. Create the following density plot using geom_density()
ggplot(data = study2,
       aes(x = age, fill = factor(__), alpha = .2)) + 
  geom_density() +
    scale_fill_discrete(name = "__") +
  labs(y = "") +
  guides(alpha = FALSE)  
ggplot(data = study2,
       aes(x = age, fill = factor(important), alpha = .2)) + 
  geom_density() +
    scale_fill_discrete(name = "Important") +
  labs(y = "") +
  guides(alpha = FALSE) # Turn off legend for filling  

Create the following plots from the mpg dataset. The mpg dataset is contained in the ggplot2 package and you should have access to it once you load either the ggplot2 package, or the tidyverse package (which contains ggplot2). You should start by looking at the mpg dataset to see which variables it contains.

  1. Make this plot from the mpg dataset!
ggplot(data = mpg,
       aes(x = __, y = __)) + 
  geom_count(alpha = .3) +
  labs(x = "__",
       y = "__")
ggplot(data = mpg,
       aes(x = cty, y = hwy)) + 
  geom_count(alpha = .3) +
  labs(x = "City Miles per Gallon",
       y = "Highway Miles per Gallon")

  1. Now this one!
ggplot(data = __,
       aes(x = __, y = __, col = __)) + 
  geom_count(alpha = .3) +
  facet_wrap( ~ __) +
  labs(x = "__",
       y = "__") +
  guides(col = FALSE,
         size = FALSE)  
ggplot(data = mpg,
       aes(x = cty, y = hwy, col = trans)) + 
  geom_count(alpha = .3) +
  facet_wrap( ~ trans) +
  labs(x = "City Miles per Gallon",
       y = "Highway Miles per Gallon") +
  guides(col = FALSE, 
         size = FALSE)

  1. You know what to do…
ggplot(data = __,
       aes(x = factor(__), y = __, fill = factor(__))) + 
  geom_violin() + 
  geom_count()+
  guides(fill = FALSE, 
         size = FALSE) +
  labs(x = "__",
       y = "__")
ggplot(data = mpg,
       aes(x = factor(class), y = hwy, fill = factor(class))) + 
  geom_violin() + 
  geom_count()+
  guides(fill = FALSE, 
         size = FALSE) +
  labs(x = "Class",
       y = "Highway Miles per Gallon")

Submit!