# code to generated and clean data from data_transformation_1:
library(truffle)
set.seed(42)
dat_demographics_messy <- data.frame(id = 1:500) %>%
truffle::truffle_demographics() %>%
truffle::dirt_demographics()
dat_gender_tidier <- dat_demographics_messy %>%
# convert to lower case and remove non-letters
mutate(gender = str_to_lower(gender),
gender = str_remove_all(gender, "[^A-Za-z]"),
# added age processing
age = str_remove_all(age, "![^A-Za-z]"),
age = if_else(age == "", NA_integer_, as.numeric(age))) %>%
# tidy up cases
mutate(gender = case_when(gender %in% c("female", "male", "nonbinary") ~ gender,
gender %in% c("femail", "fem ale", "femle", "femlae") ~ "female",
gender %in% c("malee", "mal", "ma le") ~ "male",
gender %in% c("non binary", "nonbinary") ~ "non-binary",
gender == "" ~ NA_character_, # if an empty character string, change to NA_character_
TRUE ~ gender)) # if none of the above apply, keep original value8 Data transformation III
This chapter covers “aggregating” functions. They aggregate results across rows to create a new data frame that does not contain any of the rows of the original. For example, to convert the ‘age’ column that contains participants’ ages to ‘age_mean’, which only has one row containing the average age.
8.1 Summarizing across rows
It is very common that we need to create summaries across rows. For example, to create the mean and standard deviation of a column like age. This can be done with summarize(). Remember: mutate() creates new columns or modifies the contents of existing columns, but does not change the number of rows. Whereas summarize() reduces a data frame down to one row.
library(dplyr)
library(tidyr)
library(janitor)
# mean
dat_gender_tidier %>%
summarize(mean_age = mean(age, na.rm = TRUE))| mean_age |
|---|
| 31.72093 |
dat_gender_tidier %>%
summarize(mean_age = mean(age))| mean_age |
|---|
| NA |
# SD
dat_gender_tidier %>%
summarize(sd_age = sd(age, na.rm = TRUE))| sd_age |
|---|
| 8.157242 |
# mean and SD with rounding, illustrating how multiple summarizes can be done in one function call
dat_gender_tidier %>%
summarize(mean_age = mean(age, na.rm = TRUE),
sd_age = sd(age, na.rm = TRUE)) %>%
mutate(mean_age = round_half_up(mean_age, digits = 2),
sd_age = round_half_up(sd_age, digits = 2))| mean_age | sd_age |
|---|---|
| 31.72 | 8.16 |
8.1.1 group_by()
Often, we don’t want to reduce a data frame down to a single row / summarize the whole dataset, but instead we want to create a summary for each (sub)group. For example
# illustrate use of group_by() and summarize()
dat_gender_tidier %>%
summarize(mean_age = mean(age, na.rm = TRUE),
sd_age = sd(age, na.rm = TRUE))| mean_age | sd_age |
|---|---|
| 31.72093 | 8.157242 |
dat_gender_tidier %>%
group_by(gender) %>%
summarize(mean_age = round_half_up(mean(age, na.rm = TRUE), digits = 2),
sd_age = round_half_up(sd(age, na.rm = TRUE), digits = 2))| gender | mean_age | sd_age |
|---|---|---|
| female | 32.30 | 8.40 |
| male | 30.89 | 7.69 |
| nonbinary | 29.56 | 7.94 |
| NA | 30.39 | 7.23 |
dat <- dat_gender_tidier %>%
group_by(gender)
dat %>%
summarize(mean_age = round_half_up(mean(age, na.rm = TRUE), digits = 2),
sd_age = round_half_up(sd(age, na.rm = TRUE), digits = 2))| gender | mean_age | sd_age |
|---|---|---|
| female | 32.30 | 8.40 |
| male | 30.89 | 7.69 |
| nonbinary | 29.56 | 7.94 |
| NA | 30.39 | 7.23 |
8.1.2 ungroup()
Grouping is an invisible property of a data frame or tibble. You can’t find a value inside it that tells you it’s grouped, but the grouping can persist unless you ungroup(). This can cause unexpected behavior, so watch out for it.
8.1.2.1 group_by() applied to filter and slice functions
TODO
8.1.3 n()
n() calculates the number of rows, i.e., the N. It can be useful in summarize.
# summarize n
dat_gender_tidier %>%
summarize(n_age = n())| n_age |
|---|
| 500 |
# summarize n per gender group
dat_gender_tidier %>%
group_by(gender) %>%
summarize(n_age = n())| gender | n_age |
|---|---|
| female | 323 |
| male | 131 |
| nonbinary | 16 |
| NA | 30 |
# dat_gender_tidier %>%
# summarize(n_age = n())
#
# dat_gender_tidier %>%
# n()
#
# dat_gender_tidier %>%
# mutate(n_age = n())8.1.4 Realise that count() is just a wrapper function for summarize()
count() is just the combination of group_by() and summarize() and n(). They produce the same results as above.
# summarize n
dat_gender_tidier %>%
count()| n |
|---|
| 500 |
# summarize n per gender group
dat_gender_tidier %>%
count(gender)| gender | n |
|---|---|
| female | 323 |
| male | 131 |
| nonbinary | 16 |
| NA | 30 |
8.2 distinct()
Distinct is a variation on count() that doesn’t return the counts, just the distinct values.
# summarize n per gender group
dat_gender_tidier %>%
count(gender)| gender | n |
|---|---|
| female | 323 |
| male | 131 |
| nonbinary | 16 |
| NA | 30 |
# summarize n per gender group
dat_gender_tidier %>%
distinct(gender)| gender |
|---|
| female |
| male |
| NA |
| nonbinary |
8.2.1 Handling duplicates with multiple-counts() and distinct()
It can be used to remove duplicates, but it’s important to remember that you don’t know which row is retained vs dropped.
# create duplicates that need cleaning
dat_gender_tidier_with_duplicates <- dat_gender_tidier %>%
dirt_duplicates(prop = 0.10)
# count rows
dat_gender_tidier_with_duplicates %>%
count()| n |
|---|
| 550 |
# unique participant ids
dat_gender_tidier_with_duplicates %>%
count(id, name = "n_rows")| id | n_rows |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 13 | 1 |
| 14 | 1 |
| 15 | 1 |
| 16 | 2 |
| 17 | 1 |
| 18 | 2 |
| 19 | 1 |
| 20 | 1 |
| 21 | 1 |
| 22 | 1 |
| 23 | 1 |
| 24 | 1 |
| 25 | 1 |
| 26 | 1 |
| 27 | 1 |
| 28 | 1 |
| 29 | 1 |
| 30 | 1 |
| 31 | 1 |
| 32 | 1 |
| 33 | 1 |
| 34 | 1 |
| 35 | 1 |
| 36 | 1 |
| 37 | 1 |
| 38 | 1 |
| 39 | 1 |
| 40 | 1 |
| 41 | 1 |
| 42 | 2 |
| 43 | 2 |
| 44 | 1 |
| 45 | 1 |
| 46 | 2 |
| 47 | 1 |
| 48 | 1 |
| 49 | 1 |
| 50 | 1 |
| 51 | 1 |
| 52 | 1 |
| 53 | 1 |
| 54 | 1 |
| 55 | 1 |
| 56 | 2 |
| 57 | 1 |
| 58 | 1 |
| 59 | 1 |
| 60 | 1 |
| 61 | 1 |
| 62 | 1 |
| 63 | 1 |
| 64 | 1 |
| 65 | 1 |
| 66 | 1 |
| 67 | 2 |
| 68 | 1 |
| 69 | 1 |
| 70 | 1 |
| 71 | 1 |
| 72 | 1 |
| 73 | 1 |
| 74 | 1 |
| 75 | 1 |
| 76 | 1 |
| 77 | 1 |
| 78 | 1 |
| 79 | 1 |
| 80 | 2 |
| 81 | 2 |
| 82 | 1 |
| 83 | 1 |
| 84 | 1 |
| 85 | 1 |
| 86 | 1 |
| 87 | 1 |
| 88 | 1 |
| 89 | 1 |
| 90 | 1 |
| 91 | 1 |
| 92 | 1 |
| 93 | 1 |
| 94 | 1 |
| 95 | 1 |
| 96 | 1 |
| 97 | 1 |
| 98 | 1 |
| 99 | 1 |
| 100 | 1 |
| 101 | 1 |
| 102 | 1 |
| 103 | 1 |
| 104 | 1 |
| 105 | 1 |
| 106 | 1 |
| 107 | 1 |
| 108 | 1 |
| 109 | 1 |
| 110 | 1 |
| 111 | 1 |
| 112 | 1 |
| 113 | 1 |
| 114 | 1 |
| 115 | 1 |
| 116 | 1 |
| 117 | 1 |
| 118 | 1 |
| 119 | 1 |
| 120 | 1 |
| 121 | 1 |
| 122 | 1 |
| 123 | 1 |
| 124 | 1 |
| 125 | 2 |
| 126 | 2 |
| 127 | 2 |
| 128 | 1 |
| 129 | 1 |
| 130 | 1 |
| 131 | 1 |
| 132 | 2 |
| 133 | 1 |
| 134 | 1 |
| 135 | 1 |
| 136 | 1 |
| 137 | 1 |
| 138 | 1 |
| 139 | 2 |
| 140 | 1 |
| 141 | 1 |
| 142 | 1 |
| 143 | 1 |
| 144 | 1 |
| 145 | 2 |
| 146 | 1 |
| 147 | 1 |
| 148 | 1 |
| 149 | 1 |
| 150 | 1 |
| 151 | 1 |
| 152 | 1 |
| 153 | 1 |
| 154 | 2 |
| 155 | 1 |
| 156 | 1 |
| 157 | 1 |
| 158 | 1 |
| 159 | 1 |
| 160 | 1 |
| 161 | 1 |
| 162 | 1 |
| 163 | 1 |
| 164 | 1 |
| 165 | 1 |
| 166 | 1 |
| 167 | 2 |
| 168 | 1 |
| 169 | 1 |
| 170 | 1 |
| 171 | 1 |
| 172 | 1 |
| 173 | 1 |
| 174 | 2 |
| 175 | 1 |
| 176 | 1 |
| 177 | 1 |
| 178 | 1 |
| 179 | 1 |
| 180 | 1 |
| 181 | 1 |
| 182 | 1 |
| 183 | 1 |
| 184 | 1 |
| 185 | 1 |
| 186 | 1 |
| 187 | 1 |
| 188 | 1 |
| 189 | 1 |
| 190 | 1 |
| 191 | 2 |
| 192 | 1 |
| 193 | 1 |
| 194 | 1 |
| 195 | 1 |
| 196 | 1 |
| 197 | 1 |
| 198 | 1 |
| 199 | 1 |
| 200 | 1 |
| 201 | 1 |
| 202 | 1 |
| 203 | 1 |
| 204 | 1 |
| 205 | 1 |
| 206 | 1 |
| 207 | 1 |
| 208 | 1 |
| 209 | 1 |
| 210 | 1 |
| 211 | 1 |
| 212 | 1 |
| 213 | 1 |
| 214 | 1 |
| 215 | 2 |
| 216 | 1 |
| 217 | 1 |
| 218 | 1 |
| 219 | 1 |
| 220 | 1 |
| 221 | 1 |
| 222 | 1 |
| 223 | 1 |
| 224 | 1 |
| 225 | 1 |
| 226 | 1 |
| 227 | 1 |
| 228 | 1 |
| 229 | 2 |
| 230 | 1 |
| 231 | 1 |
| 232 | 2 |
| 233 | 1 |
| 234 | 1 |
| 235 | 1 |
| 236 | 1 |
| 237 | 1 |
| 238 | 1 |
| 239 | 1 |
| 240 | 1 |
| 241 | 1 |
| 242 | 2 |
| 243 | 1 |
| 244 | 1 |
| 245 | 1 |
| 246 | 1 |
| 247 | 1 |
| 248 | 1 |
| 249 | 1 |
| 250 | 1 |
| 251 | 1 |
| 252 | 1 |
| 253 | 1 |
| 254 | 1 |
| 255 | 1 |
| 256 | 1 |
| 257 | 1 |
| 258 | 1 |
| 259 | 3 |
| 260 | 1 |
| 261 | 1 |
| 262 | 1 |
| 263 | 1 |
| 264 | 1 |
| 265 | 1 |
| 266 | 1 |
| 267 | 1 |
| 268 | 1 |
| 269 | 1 |
| 270 | 1 |
| 271 | 1 |
| 272 | 1 |
| 273 | 1 |
| 274 | 1 |
| 275 | 1 |
| 276 | 1 |
| 277 | 1 |
| 278 | 1 |
| 279 | 1 |
| 280 | 2 |
| 281 | 1 |
| 282 | 1 |
| 283 | 1 |
| 284 | 1 |
| 285 | 1 |
| 286 | 1 |
| 287 | 1 |
| 288 | 1 |
| 289 | 2 |
| 290 | 1 |
| 291 | 1 |
| 292 | 1 |
| 293 | 2 |
| 294 | 1 |
| 295 | 1 |
| 296 | 1 |
| 297 | 2 |
| 298 | 1 |
| 299 | 1 |
| 300 | 1 |
| 301 | 1 |
| 302 | 1 |
| 303 | 1 |
| 304 | 1 |
| 305 | 1 |
| 306 | 1 |
| 307 | 1 |
| 308 | 1 |
| 309 | 1 |
| 310 | 1 |
| 311 | 1 |
| 312 | 1 |
| 313 | 1 |
| 314 | 1 |
| 315 | 1 |
| 316 | 1 |
| 317 | 1 |
| 318 | 1 |
| 319 | 1 |
| 320 | 1 |
| 321 | 1 |
| 322 | 1 |
| 323 | 1 |
| 324 | 1 |
| 325 | 1 |
| 326 | 2 |
| 327 | 1 |
| 328 | 1 |
| 329 | 1 |
| 330 | 1 |
| 331 | 1 |
| 332 | 1 |
| 333 | 1 |
| 334 | 1 |
| 335 | 1 |
| 336 | 1 |
| 337 | 2 |
| 338 | 1 |
| 339 | 1 |
| 340 | 1 |
| 341 | 1 |
| 342 | 1 |
| 343 | 1 |
| 344 | 1 |
| 345 | 1 |
| 346 | 1 |
| 347 | 1 |
| 348 | 1 |
| 349 | 1 |
| 350 | 1 |
| 351 | 1 |
| 352 | 1 |
| 353 | 1 |
| 354 | 1 |
| 355 | 1 |
| 356 | 1 |
| 357 | 1 |
| 358 | 1 |
| 359 | 1 |
| 360 | 1 |
| 361 | 1 |
| 362 | 1 |
| 363 | 2 |
| 364 | 1 |
| 365 | 1 |
| 366 | 2 |
| 367 | 1 |
| 368 | 1 |
| 369 | 2 |
| 370 | 1 |
| 371 | 1 |
| 372 | 1 |
| 373 | 1 |
| 374 | 1 |
| 375 | 1 |
| 376 | 1 |
| 377 | 2 |
| 378 | 1 |
| 379 | 1 |
| 380 | 1 |
| 381 | 2 |
| 382 | 1 |
| 383 | 1 |
| 384 | 1 |
| 385 | 1 |
| 386 | 1 |
| 387 | 1 |
| 388 | 1 |
| 389 | 1 |
| 390 | 1 |
| 391 | 2 |
| 392 | 1 |
| 393 | 2 |
| 394 | 1 |
| 395 | 1 |
| 396 | 1 |
| 397 | 1 |
| 398 | 1 |
| 399 | 2 |
| 400 | 1 |
| 401 | 1 |
| 402 | 1 |
| 403 | 1 |
| 404 | 1 |
| 405 | 2 |
| 406 | 1 |
| 407 | 1 |
| 408 | 1 |
| 409 | 1 |
| 410 | 1 |
| 411 | 1 |
| 412 | 1 |
| 413 | 1 |
| 414 | 2 |
| 415 | 1 |
| 416 | 1 |
| 417 | 1 |
| 418 | 1 |
| 419 | 1 |
| 420 | 1 |
| 421 | 1 |
| 422 | 1 |
| 423 | 1 |
| 424 | 1 |
| 425 | 1 |
| 426 | 1 |
| 427 | 1 |
| 428 | 1 |
| 429 | 1 |
| 430 | 1 |
| 431 | 1 |
| 432 | 1 |
| 433 | 1 |
| 434 | 1 |
| 435 | 1 |
| 436 | 1 |
| 437 | 1 |
| 438 | 1 |
| 439 | 1 |
| 440 | 1 |
| 441 | 1 |
| 442 | 2 |
| 443 | 1 |
| 444 | 1 |
| 445 | 2 |
| 446 | 1 |
| 447 | 2 |
| 448 | 1 |
| 449 | 1 |
| 450 | 1 |
| 451 | 1 |
| 452 | 1 |
| 453 | 1 |
| 454 | 1 |
| 455 | 1 |
| 456 | 1 |
| 457 | 1 |
| 458 | 2 |
| 459 | 1 |
| 460 | 1 |
| 461 | 1 |
| 462 | 2 |
| 463 | 1 |
| 464 | 1 |
| 465 | 1 |
| 466 | 1 |
| 467 | 1 |
| 468 | 1 |
| 469 | 1 |
| 470 | 1 |
| 471 | 1 |
| 472 | 1 |
| 473 | 1 |
| 474 | 1 |
| 475 | 1 |
| 476 | 1 |
| 477 | 1 |
| 478 | 1 |
| 479 | 1 |
| 480 | 2 |
| 481 | 1 |
| 482 | 1 |
| 483 | 1 |
| 484 | 1 |
| 485 | 1 |
| 486 | 1 |
| 487 | 1 |
| 488 | 1 |
| 489 | 1 |
| 490 | 1 |
| 491 | 1 |
| 492 | 1 |
| 493 | 1 |
| 494 | 1 |
| 495 | 1 |
| 496 | 3 |
| 497 | 1 |
| 498 | 1 |
| 499 | 1 |
| 500 | 1 |
# count unique participant ids and the number of rows they have
dat_gender_tidier_with_duplicates %>%
count(id, name = "n_rows") %>%
count(n_rows, name = "n_ids_with_n_rows")| n_rows | n_ids_with_n_rows |
|---|---|
| 1 | 452 |
| 2 | 46 |
| 3 | 2 |
# count unique participant ids and the number of rows they have
dat_gender_tidier_duplicates_removed <- dat_gender_tidier_with_duplicates %>%
distinct(id, .keep_all = TRUE) # keep all other columns
dat_gender_tidier_duplicates_removed %>%
count(id, name = "n_rows") %>%
count(n_rows, name = "n_ids_with_n_rows")| n_rows | n_ids_with_n_rows |
|---|---|
| 1 | 500 |
# alternative, fewer assumptions
dat_gender_tidier_duplicates_removed2 <- dat_gender_tidier_with_duplicates %>%
distinct(id, age, gender)
dat_gender_tidier_duplicates_removed2 %>%
count(id, name = "n_rows") %>%
count(n_rows, name = "n_ids_with_n_rows")| n_rows | n_ids_with_n_rows |
|---|---|
| 1 | 500 |
8.2.2 More complex summarizations
Like mutate, the operation you do to summarize can also be more complex, such as finding the mean result of a logical test to calculate a proportion. For example, the proportion of participants who are less than 25 years old:
dat_gender_tidier %>%
summarize(proportion_less_than_25 = mean(age < 25, na.rm = TRUE)) %>%
mutate(percent_less_than_25 = round_half_up(proportion_less_than_25 * 100, 1))| proportion_less_than_25 | percent_less_than_25 |
|---|---|
| 0.2367865 | 23.7 |
dat_gender_tidier %>%
mutate(less_than_25 = if_else(age < 25, 1, 0)) %>%
summarize(percent_below_25 = mean(less_than_25, na.rm = TRUE)*100)| percent_below_25 |
|---|
| 23.67865 |
dat_gender_tidier %>%
mutate(less_than_25 = age < 25)| id | age | gender | less_than_25 |
|---|---|---|---|
| 1 | 39 | female | FALSE |
| 2 | 35 | female | FALSE |
| 3 | 37 | female | FALSE |
| 4 | 44 | female | FALSE |
| 5 | 28 | female | FALSE |
| 6 | NA | male | NA |
| 7 | 27 | female | FALSE |
| 8 | 22 | male | TRUE |
| 9 | 30 | male | FALSE |
| 10 | 38 | male | FALSE |
| 11 | 27 | female | FALSE |
| 12 | NA | female | NA |
| 13 | 34 | male | FALSE |
| 14 | 43 | female | FALSE |
| 15 | 40 | female | FALSE |
| 16 | 20 | female | TRUE |
| 17 | 31 | NA | FALSE |
| 18 | 27 | male | FALSE |
| 19 | 41 | female | FALSE |
| 20 | NA | male | NA |
| 21 | 45 | female | FALSE |
| 22 | 19 | NA | TRUE |
| 23 | 39 | male | FALSE |
| 24 | NA | female | NA |
| 25 | 43 | female | FALSE |
| 26 | NA | female | NA |
| 27 | 22 | female | TRUE |
| 28 | 41 | male | FALSE |
| 29 | 29 | female | FALSE |
| 30 | 32 | female | FALSE |
| 31 | 43 | female | FALSE |
| 32 | 23 | nonbinary | TRUE |
| 33 | 18 | female | TRUE |
| 34 | 45 | nonbinary | FALSE |
| 35 | 40 | female | FALSE |
| 36 | 30 | male | FALSE |
| 37 | 26 | female | FALSE |
| 38 | 31 | female | FALSE |
| 39 | 29 | female | FALSE |
| 40 | 35 | male | FALSE |
| 41 | 24 | female | TRUE |
| 42 | 33 | female | FALSE |
| 43 | 29 | female | FALSE |
| 44 | 29 | female | FALSE |
| 45 | 23 | female | TRUE |
| 46 | 36 | female | FALSE |
| 47 | 35 | female | FALSE |
| 48 | 19 | female | TRUE |
| 49 | 25 | female | FALSE |
| 50 | 41 | female | FALSE |
| 51 | 38 | male | FALSE |
| 52 | 18 | female | TRUE |
| 53 | 42 | female | FALSE |
| 54 | 32 | male | FALSE |
| 55 | 45 | female | FALSE |
| 56 | 35 | male | FALSE |
| 57 | 31 | female | FALSE |
| 58 | 42 | male | FALSE |
| 59 | 18 | female | TRUE |
| 60 | 23 | female | TRUE |
| 61 | 36 | female | FALSE |
| 62 | 21 | male | TRUE |
| 63 | 38 | female | FALSE |
| 64 | 18 | female | TRUE |
| 65 | NA | NA | NA |
| 66 | 18 | male | TRUE |
| 67 | 18 | female | TRUE |
| 68 | NA | female | NA |
| 69 | 32 | female | FALSE |
| 70 | 43 | female | FALSE |
| 71 | 44 | nonbinary | FALSE |
| 72 | 42 | female | FALSE |
| 73 | 35 | female | FALSE |
| 74 | 38 | male | FALSE |
| 75 | 44 | female | FALSE |
| 76 | 28 | female | FALSE |
| 77 | 18 | female | TRUE |
| 78 | 27 | female | FALSE |
| 79 | 37 | female | FALSE |
| 80 | 26 | female | FALSE |
| 81 | NA | male | NA |
| 82 | 22 | female | TRUE |
| 83 | 41 | female | FALSE |
| 84 | 23 | female | TRUE |
| 85 | 28 | female | FALSE |
| 86 | NA | female | NA |
| 87 | 22 | female | TRUE |
| 88 | 36 | female | FALSE |
| 89 | 23 | male | TRUE |
| 90 | 25 | female | FALSE |
| 91 | 33 | male | FALSE |
| 92 | 41 | female | FALSE |
| 93 | 34 | female | FALSE |
| 94 | 29 | female | FALSE |
| 95 | 27 | male | FALSE |
| 96 | 25 | female | FALSE |
| 97 | 41 | female | FALSE |
| 98 | 45 | female | FALSE |
| 99 | NA | male | NA |
| 100 | 25 | female | FALSE |
| 101 | 38 | female | FALSE |
| 102 | 42 | female | FALSE |
| 103 | 24 | female | TRUE |
| 104 | 31 | male | FALSE |
| 105 | 32 | male | FALSE |
| 106 | 32 | female | FALSE |
| 107 | 28 | female | FALSE |
| 108 | 27 | male | FALSE |
| 109 | 36 | male | FALSE |
| 110 | NA | female | NA |
| 111 | 35 | NA | FALSE |
| 112 | 32 | male | FALSE |
| 113 | 32 | female | FALSE |
| 114 | 36 | female | FALSE |
| 115 | 30 | male | FALSE |
| 116 | 37 | female | FALSE |
| 117 | 40 | male | FALSE |
| 118 | 40 | female | FALSE |
| 119 | 42 | female | FALSE |
| 120 | 20 | female | TRUE |
| 121 | 45 | female | FALSE |
| 122 | 27 | female | FALSE |
| 123 | 33 | female | FALSE |
| 124 | 38 | female | FALSE |
| 125 | 30 | female | FALSE |
| 126 | 44 | male | FALSE |
| 127 | 35 | male | FALSE |
| 128 | 18 | female | TRUE |
| 129 | 43 | female | FALSE |
| 130 | 26 | female | FALSE |
| 131 | NA | female | NA |
| 132 | 28 | NA | FALSE |
| 133 | 22 | female | TRUE |
| 134 | 31 | male | FALSE |
| 135 | 44 | female | FALSE |
| 136 | 20 | female | TRUE |
| 137 | 22 | female | TRUE |
| 138 | 45 | male | FALSE |
| 139 | 28 | male | FALSE |
| 140 | 31 | female | FALSE |
| 141 | 33 | female | FALSE |
| 142 | 22 | male | TRUE |
| 143 | 25 | female | FALSE |
| 144 | 23 | female | TRUE |
| 145 | 38 | male | FALSE |
| 146 | 21 | female | TRUE |
| 147 | 22 | female | TRUE |
| 148 | 41 | female | FALSE |
| 149 | 43 | female | FALSE |
| 150 | 29 | female | FALSE |
| 151 | 35 | NA | FALSE |
| 152 | 24 | male | TRUE |
| 153 | NA | female | NA |
| 154 | 45 | male | FALSE |
| 155 | NA | male | NA |
| 156 | 30 | male | FALSE |
| 157 | 27 | female | FALSE |
| 158 | 42 | nonbinary | FALSE |
| 159 | 33 | male | FALSE |
| 160 | 21 | male | TRUE |
| 161 | 19 | female | TRUE |
| 162 | 29 | female | FALSE |
| 163 | 30 | female | FALSE |
| 164 | 18 | female | TRUE |
| 165 | 45 | female | FALSE |
| 166 | NA | female | NA |
| 167 | 32 | female | FALSE |
| 168 | 37 | male | FALSE |
| 169 | 29 | nonbinary | FALSE |
| 170 | 32 | female | FALSE |
| 171 | 33 | NA | FALSE |
| 172 | 34 | female | FALSE |
| 173 | 34 | female | FALSE |
| 174 | 18 | male | TRUE |
| 175 | 34 | NA | FALSE |
| 176 | 27 | female | FALSE |
| 177 | 20 | male | TRUE |
| 178 | 34 | female | FALSE |
| 179 | 21 | male | TRUE |
| 180 | 42 | male | FALSE |
| 181 | 28 | nonbinary | FALSE |
| 182 | 24 | nonbinary | TRUE |
| 183 | 27 | female | FALSE |
| 184 | 33 | male | FALSE |
| 185 | 24 | nonbinary | TRUE |
| 186 | 20 | male | TRUE |
| 187 | 45 | female | FALSE |
| 188 | 35 | female | FALSE |
| 189 | 38 | female | FALSE |
| 190 | NA | female | NA |
| 191 | 43 | female | FALSE |
| 192 | 28 | nonbinary | FALSE |
| 193 | 22 | male | TRUE |
| 194 | 33 | male | FALSE |
| 195 | 30 | female | FALSE |
| 196 | 21 | male | TRUE |
| 197 | 36 | female | FALSE |
| 198 | 41 | male | FALSE |
| 199 | 32 | male | FALSE |
| 200 | 25 | male | FALSE |
| 201 | 20 | female | TRUE |
| 202 | 40 | male | FALSE |
| 203 | 26 | female | FALSE |
| 204 | 40 | female | FALSE |
| 205 | 30 | female | FALSE |
| 206 | 22 | female | TRUE |
| 207 | 29 | female | FALSE |
| 208 | 21 | female | TRUE |
| 209 | 42 | female | FALSE |
| 210 | 27 | female | FALSE |
| 211 | 43 | male | FALSE |
| 212 | 20 | female | TRUE |
| 213 | 24 | male | TRUE |
| 214 | 23 | NA | TRUE |
| 215 | 34 | female | FALSE |
| 216 | 44 | female | FALSE |
| 217 | 43 | female | FALSE |
| 218 | 37 | female | FALSE |
| 219 | NA | female | NA |
| 220 | 27 | NA | FALSE |
| 221 | 37 | female | FALSE |
| 222 | 29 | female | FALSE |
| 223 | 37 | female | FALSE |
| 224 | 45 | male | FALSE |
| 225 | 20 | female | TRUE |
| 226 | 35 | female | FALSE |
| 227 | NA | male | NA |
| 228 | 20 | male | TRUE |
| 229 | 32 | female | FALSE |
| 230 | 29 | male | FALSE |
| 231 | 32 | male | FALSE |
| 232 | 20 | female | TRUE |
| 233 | 37 | male | FALSE |
| 234 | 40 | female | FALSE |
| 235 | 23 | female | TRUE |
| 236 | 27 | female | FALSE |
| 237 | 43 | female | FALSE |
| 238 | 23 | female | TRUE |
| 239 | 28 | female | FALSE |
| 240 | 18 | female | TRUE |
| 241 | 27 | female | FALSE |
| 242 | 43 | male | FALSE |
| 243 | 20 | female | TRUE |
| 244 | 30 | female | FALSE |
| 245 | 22 | female | TRUE |
| 246 | 42 | female | FALSE |
| 247 | 27 | female | FALSE |
| 248 | 33 | female | FALSE |
| 249 | 31 | female | FALSE |
| 250 | 44 | female | FALSE |
| 251 | 44 | female | FALSE |
| 252 | 18 | female | TRUE |
| 253 | 18 | female | TRUE |
| 254 | 28 | male | FALSE |
| 255 | 31 | male | FALSE |
| 256 | 18 | male | TRUE |
| 257 | 37 | male | FALSE |
| 258 | 20 | NA | TRUE |
| 259 | 40 | male | FALSE |
| 260 | 20 | female | TRUE |
| 261 | 40 | female | FALSE |
| 262 | 44 | male | FALSE |
| 263 | 34 | NA | FALSE |
| 264 | 36 | female | FALSE |
| 265 | 36 | female | FALSE |
| 266 | 40 | male | FALSE |
| 267 | 39 | female | FALSE |
| 268 | 31 | female | FALSE |
| 269 | 18 | female | TRUE |
| 270 | 23 | male | TRUE |
| 271 | 23 | female | TRUE |
| 272 | 31 | male | FALSE |
| 273 | 34 | female | FALSE |
| 274 | 37 | female | FALSE |
| 275 | 37 | female | FALSE |
| 276 | 25 | male | FALSE |
| 277 | 26 | female | FALSE |
| 278 | 21 | nonbinary | TRUE |
| 279 | 24 | female | TRUE |
| 280 | 21 | female | TRUE |
| 281 | 32 | male | FALSE |
| 282 | 43 | male | FALSE |
| 283 | 24 | male | TRUE |
| 284 | 32 | female | FALSE |
| 285 | 33 | female | FALSE |
| 286 | 44 | female | FALSE |
| 287 | 30 | male | FALSE |
| 288 | 34 | male | FALSE |
| 289 | 32 | female | FALSE |
| 290 | NA | male | NA |
| 291 | 18 | female | TRUE |
| 292 | 43 | female | FALSE |
| 293 | 40 | female | FALSE |
| 294 | 36 | female | FALSE |
| 295 | 25 | male | FALSE |
| 296 | 33 | male | FALSE |
| 297 | 31 | female | FALSE |
| 298 | 42 | female | FALSE |
| 299 | 28 | NA | FALSE |
| 300 | 27 | male | FALSE |
| 301 | 32 | female | FALSE |
| 302 | 44 | female | FALSE |
| 303 | 45 | female | FALSE |
| 304 | 35 | nonbinary | FALSE |
| 305 | 19 | nonbinary | TRUE |
| 306 | 43 | female | FALSE |
| 307 | 25 | female | FALSE |
| 308 | 45 | female | FALSE |
| 309 | 33 | female | FALSE |
| 310 | 40 | male | FALSE |
| 311 | 40 | female | FALSE |
| 312 | 22 | female | TRUE |
| 313 | 41 | female | FALSE |
| 314 | 31 | male | FALSE |
| 315 | 24 | male | TRUE |
| 316 | 22 | female | TRUE |
| 317 | 45 | female | FALSE |
| 318 | 29 | female | FALSE |
| 319 | 23 | male | TRUE |
| 320 | 38 | NA | FALSE |
| 321 | 34 | female | FALSE |
| 322 | 38 | female | FALSE |
| 323 | 33 | female | FALSE |
| 324 | 40 | female | FALSE |
| 325 | 22 | female | TRUE |
| 326 | 23 | NA | TRUE |
| 327 | 45 | female | FALSE |
| 328 | 22 | female | TRUE |
| 329 | NA | female | NA |
| 330 | 18 | male | TRUE |
| 331 | 39 | male | FALSE |
| 332 | NA | female | NA |
| 333 | 37 | female | FALSE |
| 334 | 38 | female | FALSE |
| 335 | 35 | NA | FALSE |
| 336 | 38 | female | FALSE |
| 337 | 36 | male | FALSE |
| 338 | 18 | female | TRUE |
| 339 | 23 | NA | TRUE |
| 340 | 28 | female | FALSE |
| 341 | 29 | female | FALSE |
| 342 | 20 | male | TRUE |
| 343 | 32 | male | FALSE |
| 344 | 31 | male | FALSE |
| 345 | 18 | female | TRUE |
| 346 | 41 | male | FALSE |
| 347 | 34 | female | FALSE |
| 348 | 20 | female | TRUE |
| 349 | 18 | female | TRUE |
| 350 | 30 | female | FALSE |
| 351 | 19 | female | TRUE |
| 352 | 45 | male | FALSE |
| 353 | 30 | male | FALSE |
| 354 | 28 | male | FALSE |
| 355 | 23 | female | TRUE |
| 356 | 40 | female | FALSE |
| 357 | 23 | female | TRUE |
| 358 | 39 | female | FALSE |
| 359 | 24 | female | TRUE |
| 360 | 29 | female | FALSE |
| 361 | 44 | female | FALSE |
| 362 | 39 | female | FALSE |
| 363 | 18 | male | TRUE |
| 364 | 36 | NA | FALSE |
| 365 | 30 | male | FALSE |
| 366 | 20 | male | TRUE |
| 367 | 42 | male | FALSE |
| 368 | 42 | female | FALSE |
| 369 | 25 | female | FALSE |
| 370 | 39 | female | FALSE |
| 371 | 42 | female | FALSE |
| 372 | 40 | male | FALSE |
| 373 | 32 | female | FALSE |
| 374 | 22 | male | TRUE |
| 375 | 28 | female | FALSE |
| 376 | 30 | female | FALSE |
| 377 | 32 | male | FALSE |
| 378 | 31 | female | FALSE |
| 379 | 41 | female | FALSE |
| 380 | 36 | female | FALSE |
| 381 | 20 | female | TRUE |
| 382 | 45 | male | FALSE |
| 383 | 18 | female | TRUE |
| 384 | 30 | male | FALSE |
| 385 | 26 | male | FALSE |
| 386 | 34 | female | FALSE |
| 387 | 43 | female | FALSE |
| 388 | 33 | female | FALSE |
| 389 | 42 | female | FALSE |
| 390 | 34 | female | FALSE |
| 391 | 24 | NA | TRUE |
| 392 | NA | female | NA |
| 393 | 29 | nonbinary | FALSE |
| 394 | 28 | female | FALSE |
| 395 | 29 | female | FALSE |
| 396 | 28 | female | FALSE |
| 397 | 22 | male | TRUE |
| 398 | 30 | female | FALSE |
| 399 | 44 | female | FALSE |
| 400 | 38 | female | FALSE |
| 401 | 45 | female | FALSE |
| 402 | 38 | female | FALSE |
| 403 | 38 | female | FALSE |
| 404 | 22 | female | TRUE |
| 405 | 41 | female | FALSE |
| 406 | 36 | female | FALSE |
| 407 | 43 | female | FALSE |
| 408 | 44 | female | FALSE |
| 409 | 44 | female | FALSE |
| 410 | 27 | nonbinary | FALSE |
| 411 | 32 | male | FALSE |
| 412 | 19 | NA | TRUE |
| 413 | 30 | female | FALSE |
| 414 | 44 | female | FALSE |
| 415 | 29 | female | FALSE |
| 416 | 21 | female | TRUE |
| 417 | 45 | female | FALSE |
| 418 | 24 | female | TRUE |
| 419 | 45 | female | FALSE |
| 420 | 30 | male | FALSE |
| 421 | NA | female | NA |
| 422 | 36 | NA | FALSE |
| 423 | NA | female | NA |
| 424 | 30 | female | FALSE |
| 425 | 43 | female | FALSE |
| 426 | 33 | female | FALSE |
| 427 | 27 | male | FALSE |
| 428 | 18 | NA | TRUE |
| 429 | 33 | NA | FALSE |
| 430 | 26 | female | FALSE |
| 431 | 20 | male | TRUE |
| 432 | 41 | NA | FALSE |
| 433 | 18 | female | TRUE |
| 434 | 29 | female | FALSE |
| 435 | 42 | female | FALSE |
| 436 | 24 | female | TRUE |
| 437 | NA | female | NA |
| 438 | 32 | female | FALSE |
| 439 | 26 | female | FALSE |
| 440 | 19 | male | TRUE |
| 441 | 44 | female | FALSE |
| 442 | 31 | NA | FALSE |
| 443 | NA | female | NA |
| 444 | 38 | female | FALSE |
| 445 | 29 | nonbinary | FALSE |
| 446 | NA | NA | NA |
| 447 | 35 | female | FALSE |
| 448 | 29 | female | FALSE |
| 449 | 39 | male | FALSE |
| 450 | 45 | female | FALSE |
| 451 | 28 | female | FALSE |
| 452 | 42 | female | FALSE |
| 453 | 26 | male | FALSE |
| 454 | 20 | male | TRUE |
| 455 | 41 | female | FALSE |
| 456 | 45 | female | FALSE |
| 457 | 43 | NA | FALSE |
| 458 | 26 | NA | FALSE |
| 459 | 18 | female | TRUE |
| 460 | 31 | male | FALSE |
| 461 | 44 | female | FALSE |
| 462 | 36 | male | FALSE |
| 463 | 29 | male | FALSE |
| 464 | 41 | male | FALSE |
| 465 | 39 | female | FALSE |
| 466 | 31 | female | FALSE |
| 467 | 29 | male | FALSE |
| 468 | 37 | female | FALSE |
| 469 | 33 | female | FALSE |
| 470 | 40 | female | FALSE |
| 471 | 39 | NA | FALSE |
| 472 | 21 | male | TRUE |
| 473 | 30 | female | FALSE |
| 474 | 22 | female | TRUE |
| 475 | 21 | female | TRUE |
| 476 | 37 | female | FALSE |
| 477 | 18 | male | TRUE |
| 478 | 30 | female | FALSE |
| 479 | 26 | male | FALSE |
| 480 | 26 | female | FALSE |
| 481 | 39 | NA | FALSE |
| 482 | 30 | male | FALSE |
| 483 | 30 | female | FALSE |
| 484 | 44 | female | FALSE |
| 485 | 31 | male | FALSE |
| 486 | 26 | nonbinary | FALSE |
| 487 | 19 | female | TRUE |
| 488 | 44 | female | FALSE |
| 489 | 26 | female | FALSE |
| 490 | 45 | female | FALSE |
| 491 | 41 | female | FALSE |
| 492 | 32 | female | FALSE |
| 493 | 29 | male | FALSE |
| 494 | 35 | female | FALSE |
| 495 | 29 | male | FALSE |
| 496 | 43 | female | FALSE |
| 497 | 32 | male | FALSE |
| 498 | 33 | female | FALSE |
| 499 | 40 | female | FALSE |
| 500 | 29 | female | FALSE |
vec <- c(TRUE, FALSE, FALSE, FALSE, FALSE)
vec <- c(1, 0, 0, 0, 0)
mean(vec)*100[1] 20
8.2.3 Practice using summarize()
8.2.3.1 1
Calculate the N, min, max, mean, and SD of age in dat_gender_tidier.
bind_rows(
dat_gender_tidier %>%
drop_na(age) %>%
summarize(N = n(),
min = min(age),
max = max(age),
mean = mean(age),
sd = sd(age)) %>%
round_half_up(2),
dat_gender_tidier %>%
summarize(N = n(),
min = min(age, na.rm = TRUE),
max = max(age, na.rm = TRUE),
mean = mean(age, na.rm = TRUE),
sd = sd(age, na.rm = TRUE)) %>%
round_half_up(2)
)| N | min | max | mean | sd |
|---|---|---|---|---|
| 473 | 18 | 45 | 31.72 | 8.16 |
| 500 | 18 | 45 | 31.72 | 8.16 |
8.2.3.2 2 TODO
Currently each participant has up to three responses on the self-report scales (three item scale: like, positive, and prefer). Create a new dataframe containing each unique_id’s mean score across the items. Also calculate how many items each participant has data for, and whether they have complete data (i.e., data for three items).
8.2.3.3 3 TODO
Using only participants with complete, calculate the mean and SD of all participant’s mean scores on the self-reports.
8.2.3.4 4 TODO
Create a new data frame that calculates the proportion of prime-congruent trials for each participant on the AMP (i.e., the mean of the ‘correct’ column), their proportion of too-fast trials, and their number of trials.
Also add to that data frame a new column called “exclude_amp” and set it to “exclude” if more than 10% of a participant’s trials are too-fast trials and “include” if not.
8.2.3.5 5 TODO
Calculate the proportion of participants who are to be excluded.
# data_amp_scored %>%8.3 Frequncies of sets of columns
Note that it is also possible to use count to obtain the frequencies of sets of unique values across columns, e.g., unique combinations of item and response.
# data_demographics_trimmed %>%
# count(item)
#
# data_demographics_trimmed %>%
# count(response)
#
# data_demographics_trimmed %>%
# count(item, response)It can be useful to arrange the output by the frequencies.
# data_demographics_trimmed %>%
# count(item, response) %>%
# arrange(desc(n)) # arrange in descending order8.3.1 summarize(across())
TODO here or later in
8.4 Check your learning
What is the difference between mutate() and summarize()? If I use the wrong one, will I get the same answer? E.g., mutate(mean_age = mean(age, na.rm = TRUE)) vs. summarize(mean_age = mean(age, na.rm = TRUE))