[]
<
, ==
, >
, !=
sum()
, and percentages with mean()
)[]
, logical vectors, and assignment <-
.# Create some vectors of data
height <- c(168, 157, 189, 147, 172, 166, 201)
initials <- c("NP", "HT", "AM", "MP", "RH", "MS", "JT")
sex <- c("m", "f", "m", "f", "m", "f", "m")
age <- c(28, 19, 23, 24, 25, 22, 26)
# What was the height of the first entry?
height[1]
# What were the sexes of the first 5 entries?
sex[1:5]
# What was the age of the last entry?
age[length(age)]
# How many were women?
sum(sex == "f")
# How many people were taller than 160cm?
sum(height > 160)
# What percent of people were younger than 25?
mean(age < 25)
# What were the initials of the females?
initials[sex == "f"]
# What were the intials of females older than 22?
initials[sex == "f" & age > 22]
# What was the height, sex, and age of the person with initials "JT"?
height[intials == "JT"]
sex[initials == "JT"]
age[initials == "JT"]
# The height of pseron 'NP' is incorrect, it should be 172. Change it
height[initials == "NP"] <- 172
# The ages of all females are one year too low, add 1 to each age
age[sex == "f"] <- age[sex == "f"] + 1
wpa_2_LastFirst.R
(where Last and First is your last and first name). At the top of your script write the assignment number, your name and date (as comments!). For the rest of the assignment, when you answer a task, indicate which task you are answering with appropriate comments.In this assignment you will analyse (fictional!) data from a survey of 200 people at one of two bars in Basel (Grenzwert and Paddy’s) last Friday night at 3:00am. The goal of the survey is to see if there is an effect of cologne on how long people talk with others at the bar. As each of the 300 people entered they bars, secretly given a spray of either one of two types of cologne Acqua Di Gio or CK One, or no cologne at all. For the rest of the night, two (very busy) researchers recorded how long each person spent talking to people at the bar. The data are stored in the following 5 vector objects:
id
: An id indicating the participant in the form x.n
, where x
is the name of the bar the participant was at, and n
is a random indexing number)sex
: The person’s sex: male
or female
cologne
: Which cologne did the person receive? "gio"
, "ckone"
, or "none"
bar
: The bar -grenzwert
or paddys
time
: The amount of time the person spent talking to people in minutesA. First, get the data objects into your R session. Thankfully, you don’t need to type in the data yourself! The objects are stored in an RData file online at https://github.com/ndphillips/IntroductionR_Course/blob/master/data/wpa2.RData?raw=true. Run the following code to load the vectors into your R session.
# Load the data into my current session
load(file = url("https://github.com/ndphillips/IntroductionR_Course/blob/master/data/wpa2.RData?raw=true"))
B. Make sure the objects (id
, sex
, cologne
, bar
, time
) were loaded correctly and get to know them by running the str()
function on each of the 5 vectors.
table()
)table(cologne)
## cologne
## ckone gio none
## 100 100 100
mean(time)
## [1] 143.4
sd(time)
## [1] 64.82825
time_z
a z-score transformation of time. (Hint: z-score is defined as (x - mean(x)) / sd(x)
)time_z <- (time - mean(time)) / sd(time)
time[1]
## [1] 302
cologne[1]
## [1] "gio"
sex[1]
## [1] "f"
sex[1:5]
## [1] "f" "f" "f" "f" "m"
a:b
)cologne[10:20]
## [1] "ckone" "ckone" "ckone" "ckone" "ckone" "none" "ckone" "gio"
## [9] "ckone" "gio" "gio"
length()
function with the appropriate argument)bar[length(bar)]
## [1] "grenzwert"
sum(cologne == "gio")
## [1] 100
sum(cologne == "ckone")
## [1] 100
sum(cologne == "none")
## [1] 100
sum(bar == "grenzwert")
## [1] 150
sum(bar == "paddys")
## [1] 150
mean()
combined with a logical vector)mean(bar == "grenzwert")
## [1] 0.5
sum(time > 30)
## [1] 267
mean(time > 30)
## [1] 0.89
&
)mean(time > 20 & time < 40)
## [1] 0.1366667
g
)id[bar == "grenzwert"]
## [1] "g.88" "g.92" "g.11" "g.69" "g.68" "g.76" "g.14" "g.67"
## [9] "g.12" "g.83" "g.95" "g.20" "g.91" "g.91" "g.52" "g.72"
## [17] "g.56" "g.86" "g.57" "g.58" "g.50" "g.93" "g.76" "g.90"
## [25] "g.79" "g.65" "g.55" "g.60" "g.73" "g.81" "g.62" "g.64"
## [33] "g.35" "g.73" "g.41" "g.59" "g.90" "g.97" "g.94" "g.61"
## [41] "g.93" "g.30" "g.66" "g.78" "g.60" "g.59" "g.15" "g.23"
## [49] "g.64" "g.36" "g.95" "g.34" "g.94" "g.74" "g.52" "g.82"
## [57] "g.96" "g.87" "g.63" "g.63" "g.49" "g.31" "g.100" "g.47"
## [65] "g.38" "g.86" "g.28" "g.72" "g.26" "g.99" "g.96" "g.99"
## [73] "g.98" "g.69" "g.74" "g.89" "g.78" "g.77" "g.19" "g.79"
## [81] "g.75" "g.27" "g.54" "g.48" "g.77" "g.16" "g.82" "g.39"
## [89] "g.94" "g.29" "g.81" "g.92" "g.51" "g.80" "g.70" "g.53"
## [97] "g.83" "g.13" "g.100" "g.42" "g.70" "g.80" "g.61" "g.51"
## [105] "g.44" "g.93" "g.45" "g.65" "g.55" "g.87" "g.84" "g.62"
## [113] "g.66" "g.33" "g.53" "g.54" "g.67" "g.68" "g.95" "g.71"
## [121] "g.97" "g.40" "g.91" "g.100" "g.85" "g.17" "g.96" "g.46"
## [129] "g.56" "g.85" "g.25" "g.24" "g.92" "g.98" "g.18" "g.75"
## [137] "g.21" "g.57" "g.97" "g.43" "g.88" "g.71" "g.37" "g.22"
## [145] "g.58" "g.84" "g.89" "g.99" "g.32" "g.98"
sex[bar == "paddys"]
## [1] "f" "f" "m" "m" "f" "m" "m" "m" "m" "m" "f" "f" "m" "m" "m" "f" "m"
## [18] "f" "m" "f" "m" "m" "f" "f" "m" "m" "m" "m" "f" "m" "m" "m" "f" "f"
## [35] "m" "m" "m" "f" "m" "f" "m" "m" "m" "m" "f" "f" "m" "f" "m" "m" "f"
## [52] "f" "m" "m" "f" "f" "f" "f" "m" "f" "m" "m" "f" "f" "m" "f" "m" "m"
## [69] "f" "f" "f" "f" "f" "m" "f" "m" "m" "f" "f" "m" "f" "f" "m" "m" "f"
## [86] "f" "f" "f" "m" "f" "m" "m" "m" "m" "f" "m" "f" "m" "m" "f" "m" "f"
## [103] "f" "f" "m" "m" "m" "f" "m" "m" "f" "m" "m" "m" "m" "f" "m" "f" "f"
## [120] "f" "f" "f" "m" "m" "m" "f" "f" "m" "f" "f" "f" "f" "f" "m" "m" "m"
## [137] "f" "f" "m" "f" "f" "m" "m" "m" "m" "m" "m" "m" "f" "f"
mean(sex[bar == "paddys"] == "m")
## [1] 0.5333333
mean(time[sex == "m"])
## [1] 143.0962
mean(time[sex == "f"])
## [1] 143.7292
mean(time[bar == "grenzwert"])
## [1] 98.24
mean(time[bar == "paddys"])
## [1] 188.56
mean(time[cologne == "gio"])
## [1] 159.98
mean(time[cologne == "ckone"])
## [1] 170.13
mean(time[cologne == "none"])
## [1] 100.09
# They should wear ckone!
a[] <- b
In the next questions, we’ll use indexing and assignment to change the values within a vector. To do this, we’ll start by creating copies of the original data so we can easily recover the data if we screw something up.
bar.r
, cologne.r
and time.r
that are copies of the original bar
, cologne
and time
objects (Hint: Just assign the existing vectors to new objects)bar.r <- bar
cologne.r <- cologne
time.r <- time
bar.r
vector, change the "grenzwert"
values to "g"
. Now change the "paddys"
values to "p"
bar.r[bar.r == "grenzwert"] <- "g"
bar.r[bar.r == "paddys"] <- "p"
cologne.r
vector, change the "gio"
values to "G"
. Now change the "ckone"
values to "C"
. Now change "none"
to "N"
cologne.r[cologne == "gio"] <- "G"
cologne.r[cologne == "ckone"] <- "C"
time.r
vector, change all time values greater than 280 to 280. Confirm that you did it correctly by calculating the maximum time in time.r
time.r[time > 280] <- 280
max(time.r)
## [1] 280
# They should wear ckone!
Let’s see if your prediction holds up!
mean(time[bar == "grenzwert" & cologne == "gio"])
## [1] 145.0111
mean(time[bar == "grenzwert" & cologne == "ckone"])
## [1] 38
mean(time[bar == "paddys" & cologne == "gio"])
## [1] 294.7
mean(time[bar == "paddys" & cologne == "ckone"])
## [1] 184.8111
# They should wear gio!!
You can visualize the data using the following code
# Combine vectors in a dataframe
survey.df <- data.frame(bar, cologne, time)
# Create a pirateplot of the data
yarrr:::pirateplot(time ~ cologne + bar,
data = survey.df)
What you’ve just seen is an example of Simpson’s Paradox. If you want to learn more, check out the wikipedia page.
mean(cologne[sex == "f"] == "ckone")
## [1] 0.3263889
median(time[bar == "grenzwert" & cologne == "gio" & time > 100])
## [1] 144.5
mean((bar == "grenzwert" & time < 220) | (bar == "paddys" & time > 150 & time <= 250))
## [1] 0.9633333
rnorm()
)time[cologne == "ckone"] <- time[cologne == "ckone"] + rnorm(n = sum(cologne == "ckone"), mean = 30, sd = 5)
wpa_2_LastFirst.R
file to me at nathaniel.phillips@unibas.ch.