This assignment is based on the following readings:
Before you start anything else, open the ggplot2 cheatsheet here in a new window: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf. You should keep this handy, and maybe even print one out for yourself, as it will help you to quickly find code to customise your plots.
Check out the following links for additional guides, examples, and inspiration on how to make more interesting and complex plots. All of these links contain complete replicable code. I highly recommend spending a few minutes browsing them!
# Install packages
#
# install.packages('tidyverse')
# install.packages('yarrr')
# install.packages('jtools')
library(tidyverse) # Load tidyverse packages
library(yarrr) # Load yarrr package (for pirates data)
library(jtools) # Load jtools (for theme_apa())
# Set theme (try running one of these themes and see how the following plots change)
# theme_set(theme_apa())
# theme_set(theme_bw())
# theme_set(theme_minimal())
# Histogram
## Default
ggplot(data = pirates,
aes(x = height)) +
geom_histogram()
## With customisations
ggplot(data = pirates,
aes(x = height)) +
geom_histogram(colour = "black",
fill = "white",
bins = 15) +
labs(x = "Height (cm)",
y = "Frequency") +
scale_x_continuous(breaks = seq(100, 250, 10), # Set custom x axis
limits = c(120, 250)) +
scale_y_continuous(breaks = seq(0, 300, 50), # Set custom 7 axis
limits = c(0, 300)) +
geom_vline(xintercept = mean(pirates$height), # Add vertical line at mean
col = "red") +
geom_text(mapping = aes(x = mean(pirates$height),
y = 300,
label = "Mean"),
nudge_x = 7) # Move text a bit to the right
# Continuous IV, Continuous DV
# Scatterplot
ggplot(data = pirates,
aes(x = height, y = weight)) +
geom_point()
# With customisations
ggplot(data = pirates,
aes(x = height, y = weight, col = sex)) +
geom_point(alpha = .2) +
geom_smooth(method = "lm", colour = "blue") +
theme(panel.grid.major = element_line(colour = gray(.9)))
# More customizations
ggplot(data = pirates,
aes(x = height, y = weight, col = sex)) +
geom_point(alpha = .2) +
geom_smooth(method = "lm", colour = "black") +
theme(panel.grid.major = element_line(colour = gray(.9))) +
facet_wrap(~sex) +
guides(col = FALSE)
# Discrete IV, Continuous DV
### Violin plot
ggplot(data = pirates,
aes(x = factor(fav.pixar), y = tchests)) +
geom_violin() +
labs(x = "Favorite Pixar Movie",
y = "Treasure Chests")
### Boxplot
ggplot(data = pirates,
aes(x = sword.type, y = tchests, fill = sword.type)) +
geom_boxplot() +
labs(x = "Favorite Sword",
y = "Treasure chests") +
guides(fill = FALSE) # Turn off legend for filling
### Barplot
# First, calculate aggregate data to be plotted
pirates_agg <- pirates %>%
group_by(headband, sex) %>%
summarise(
tchests_mean = mean(tchests),
tchests_lb = t.test(tchests)$conf.int[1],
tchests_ub = t.test(tchests)$conf.int[2]
)
## Simple
ggplot(data = pirates_agg,
aes(x = headband, y = tchests_mean)) +
geom_bar(stat = "identity") +
labs(y = "Mean treasure chests found")
# Grouped barplot with error bars
ggplot(data = pirates_agg,
aes(x = headband, y = tchests_mean, fill = sex)) +
geom_bar(stat = "identity", position = position_dodge(0.9), col = "white") +
geom_errorbar(aes(ymax = tchests_lb,
ymin = tchests_ub),
position = position_dodge(0.9),
width = 0.25) +
labs(y = "Treasure chests found")
In this WPA, we will analyze data from Millar et al. (2016): It’s personal: : The effect of personal value on utilitarian moral judgments.
Here is the abstract (You can find the full paper at http://journal.sjdm.org/16/16428/jdm16428.pdf):
We investigated whether the personal importance of objects influences utilitarian decision-making in which damaging property is necessary to produce an overall positive outcome. In Experiment 1, participants judged saving five objects by destroying a sixth object to be less acceptable when the action required destroying the sixth object directly (rather than as a side-effect) and the objects were personally important (rather than unimportant). In Experiment 2, we demonstrated that utilitarian judgments were not influenced by the objects’ monetary worth. Together these findings suggest that personal importance underlies people’s sensitivity to damaging property as a means for utilitarian gains.
The original data are stored as csv viles at sjdm.org. However, the data needed some cleaning. The cleaned versions contain the original data, but with better labels and some minor corrections.
Variable | Description |
---|---|
acceptability | How acceptable is the action? |
important | Were the objects important to the owner or not? |
direct | Was the destruction of an object a means of saving the others or a side-effect? |
cover | Was the object a poster or a clock? |
gender | Participant gender |
age | Participant Age |
storycomp | Comprehension question 1 |
itemcomp | Comprehension question 2 |
ownercomp | Comprehension question 3 |
failed | Did participant fail an attention check? |
Variable | Description |
---|---|
acceptability | How acceptable is the action? |
important | Were the objects important to the owner or not? |
direct | Was the destruction of an object a means of saving the others or a side-effect? |
expensive | Was the object expensive or not? |
previoustrolley | Did participants complete a trolley problem in the past? |
gender | Participant gender |
age | Participant Age |
topiccomp | Comprehension question 1 |
expensivecomp | Comprehension question 2 |
importancecomp | Comprehension question 3 |
failed | Did participant fail an attention check? |
For this assignment, you’ll need both the yarrr
package and the tidyverse
packages. The tidyverse
package is actually a collection of many packages, includeing ggplot2
and dplyr
. Install them (if you don’t have them already) and then load the packages using the following code:
Open your R project from last week (I recommended calling it RCourse
or something similar). There should be at least two folders in this working directory: data
and R
.
Open a new R script and save it as wpa_5_LastFirst.R
in the R
folder in your project directory
The data are stored in two separate .csv files. Study 1 is at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/Millar_study1.txt and Study 2 is at https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/Millar_study2.txt. Load the data into R with the following code:
study1 <- read.table(file = "https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/Millar_study1.txt",
sep = "\t",
header = TRUE,
stringsAsFactors = FALSE)
study2 <- read.table(file = "https://raw.githubusercontent.com/ndphillips/IntroductionR_Course/master/assignments/wpa/data/Millar_study2.txt",
sep = "\t",
header = TRUE,
stringsAsFactors = FALSE)
str()
and head()
to make sure they were loaded correctly.ggplot(data = __,
aes(x = __)) +
geom_histogram(bins = __,
col = "__")) +
scale_x_continuous(limits = c(__, __))
ggplot(data = study1,
aes(x = acceptability)) +
geom_histogram(bins = 10, col = "white") +
scale_x_continuous(limits = c(0, 10))
geom_vline
(look at http://www.sthda.com/english/wiki/ggplot2-line-types-how-to-change-line-types-of-a-graph-in-r-software for tips on creating different line types)ggplot(data = __,
aes(x = __)) +
geom_histogram(bins = 10, col = "__", fill = "__") +
scale_x_continuous(limits = c(__, __)) +
geom_vline(xintercept = mean(study2$acceptability, na.rm = TRUE),
linetype = "dashed")
ggplot(data = study2,
aes(x = acceptability)) +
geom_histogram(bins = 10, col = "black", fill = "white") +
scale_x_continuous(limits = c(0, 10)) +
geom_vline(xintercept = mean(study2$acceptability, na.rm = TRUE),
linetype = "dashed")
ggplot(data = __,
aes(x = __, y = __)) +
geom_point()
ggplot(data = study1,
aes(x = age, y = acceptability)) +
geom_point()
geom_count()
instead of geom_point()
:ggplot(data = __,
mapping = aes(x = __, y = __)) +
geom_count()
ggplot(data = study1,
mapping = aes(x = age, y = acceptability)) +
geom_count()
geom_smooth(method = 'lm')
ggplot(data = __,
mapping = aes(x = __, y = __)) +
geom_count() +
geom_smooth(method = "__")
ggplot(data = study1,
mapping = aes(x = age, y = acceptability)) +
geom_count() +
geom_smooth(method = 'lm')
col = factor(gender)
in the aesthetic mapping:ggplot(data = subset(study1, gender %in% c(1, 2)),
mapping = aes(x = __,
y = __,
col = factor(__))) +
geom_count() +
geom_smooth(method = '__', se = FALSE) +
scale_colour_discrete(name = "__")
ggplot(data = subset(study1, gender %in% c(1, 2)),
mapping = aes(x = age, y = acceptability, col = factor(gender))) +
geom_count() + geom_smooth(method = 'lm', se = FALSE) +
scale_colour_discrete(name = "Gender")
study1_agg_1 <- study1 %>%
filter(complete.cases(study1)) %>% # Only inclue rows without NAs
group_by(important) %>% # Group by important
summarise(
acceptability_mean = mean(acceptability, na.rm = TRUE), # Mean
acceptability_lb = t.test(acceptability)$conf.int[1], # CI lower bound
acceptability_ub = t.test(acceptability)$conf.int[2] # CI upper bound
)
Ok now we’re ready! Create the following barplot using this template:
ggplot(data = study1_agg_1,
aes(x = factor(__), y = __)) +
geom_bar(stat = "identity", position = position_dodge(.9), col = "white") +
geom_errorbar(aes(ymax = __,
ymin = __),
position = position_dodge(.9),
width = 0.25) +
labs(x = "__",
y = "__")
ggplot(data = study1_agg_1,
aes(x = factor(important), y = acceptability_mean)) +
geom_bar(stat = "identity", position = position_dodge(.9), col = "white") +
geom_errorbar(aes(ymax = acceptability_ub,
ymin = acceptability_lb),
position = position_dodge(.9),
width = 0.25) +
labs(x = "Important",
y = "Acceptability")
study1_agg_2 <- study1 %>%
filter(complete.cases(study1)) %>% # Only inclue rows without NAs
group_by(important, direct) %>% # Group by important AND direct
summarise(
acceptability_mean = mean(acceptability, na.rm = TRUE), # Mean
acceptability_lb = t.test(acceptability)$conf.int[1], # CI lower bound
acceptability_ub = t.test(acceptability)$conf.int[2] # CI upper bound
)
Ok now we’re ready! Create the following barplot using this template:
ggplot(data = study1_agg_2,
aes(x = factor(__), y = __, fill = factor(__))) +
geom_bar(stat = "identity", position = position_dodge(.9), col = "__") +
geom_errorbar(aes(ymax = __,
ymin = __),
position = position_dodge(.9),
width = 0.25) +
labs(x = "__",
y = "__") +
scale_fill_discrete(name = "__")
ggplot(data = study1_agg_2,
aes(x = factor(important), y = acceptability_mean, fill = factor(direct))) +
geom_bar(stat = "identity", position = position_dodge(.9), col = "white") +
geom_errorbar(aes(ymax = acceptability_lb,
ymin = acceptability_ub),
position = position_dodge(.9),
width = 0.25) +
labs(x = "Important",
y = "Acceptability") +
scale_fill_discrete(name = "Direct")
geom_density()
ggplot(data = study2,
aes(x = age, fill = factor(__), alpha = .2)) +
geom_density() +
scale_fill_discrete(name = "__") +
labs(y = "") +
guides(alpha = FALSE)
ggplot(data = study2,
aes(x = age, fill = factor(important), alpha = .2)) +
geom_density() +
scale_fill_discrete(name = "Important") +
labs(y = "") +
guides(alpha = FALSE) # Turn off legend for filling
Create the following plots from the mpg
dataset. The mpg
dataset is contained in the ggplot2
package and you should have access to it once you load either the ggplot2
package, or the tidyverse
package (which contains ggplot2
). You should start by looking at the mpg
dataset to see which variables it contains.
mpg
dataset!ggplot(data = mpg,
aes(x = __, y = __)) +
geom_count(alpha = .3) +
labs(x = "__",
y = "__")
ggplot(data = mpg,
aes(x = cty, y = hwy)) +
geom_count(alpha = .3) +
labs(x = "City Miles per Gallon",
y = "Highway Miles per Gallon")
ggplot(data = __,
aes(x = __, y = __, col = __)) +
geom_count(alpha = .3) +
facet_wrap( ~ __) +
labs(x = "__",
y = "__") +
guides(col = FALSE,
size = FALSE)
ggplot(data = mpg,
aes(x = cty, y = hwy, col = trans)) +
geom_count(alpha = .3) +
facet_wrap( ~ trans) +
labs(x = "City Miles per Gallon",
y = "Highway Miles per Gallon") +
guides(col = FALSE,
size = FALSE)
ggplot(data = __,
aes(x = factor(__), y = __, fill = factor(__))) +
geom_violin() +
geom_count()+
guides(fill = FALSE,
size = FALSE) +
labs(x = "__",
y = "__")
ggplot(data = mpg,
aes(x = factor(class), y = hwy, fill = factor(class))) +
geom_violin() +
geom_count()+
guides(fill = FALSE,
size = FALSE) +
labs(x = "Class",
y = "Highway Miles per Gallon")
wpa_5_LastFirst.R
file to me at nathaniel.phillips@unibas.ch.