8  Data transformation III

This chapter covers “aggregating” functions. They aggregate results across rows to create a new data frame that does not contain any of the rows of the original. For example, to convert the ‘age’ column that contains participants’ ages to ‘age_mean’, which only has one row containing the average age.

8.1 Summarizing across rows

It is very common that we need to create summaries across rows. For example, to create the mean and standard deviation of a column like age. This can be done with summarize(). Remember: mutate() creates new columns or modifies the contents of existing columns, but does not change the number of rows. Whereas summarize() reduces a data frame down to one row.

# code to generated and clean data from data_transformation_1:
library(truffle)

set.seed(42)

dat_demographics_messy <- data.frame(id = 1:500) %>%
  truffle::truffle_demographics() %>%
  truffle::dirt_demographics()

dat_gender_tidier <- dat_demographics_messy %>%
  # convert to lower case and remove non-letters
  mutate(gender = str_to_lower(gender),
         gender = str_remove_all(gender, "[^A-Za-z]"),
         # added age processing
         age = str_remove_all(age, "![^A-Za-z]"),
         age = if_else(age == "", NA_integer_, as.numeric(age))) %>%
  # tidy up cases
  mutate(gender = case_when(gender %in% c("female", "male", "nonbinary") ~ gender, 
                            gender %in% c("femail", "fem ale", "femle", "femlae") ~ "female",
                            gender %in% c("malee", "mal", "ma le") ~ "male",
                            gender %in% c("non binary", "nonbinary") ~ "non-binary",
                            gender == "" ~ NA_character_, # if an empty character string, change to NA_character_
                            TRUE ~ gender)) # if none of the above apply, keep original value
library(dplyr)
library(tidyr)
library(janitor)

# mean
dat_gender_tidier %>%
  summarize(mean_age = mean(age, na.rm = TRUE))
mean_age
31.72093
dat_gender_tidier %>%
  summarize(mean_age = mean(age))
mean_age
NA
# SD
dat_gender_tidier %>%
  summarize(sd_age = sd(age, na.rm = TRUE))
sd_age
8.157242
# mean and SD with rounding, illustrating how multiple summarizes can be done in one function call
dat_gender_tidier %>%
  summarize(mean_age = mean(age, na.rm = TRUE),
            sd_age = sd(age, na.rm = TRUE)) %>%
  mutate(mean_age = round_half_up(mean_age, digits = 2),
         sd_age = round_half_up(sd_age, digits = 2))
mean_age sd_age
31.72 8.16

8.1.1 group_by()

Often, we don’t want to reduce a data frame down to a single row / summarize the whole dataset, but instead we want to create a summary for each (sub)group. For example

# illustrate use of group_by() and summarize()
dat_gender_tidier %>%
  summarize(mean_age = mean(age, na.rm = TRUE),
            sd_age = sd(age, na.rm = TRUE))
mean_age sd_age
31.72093 8.157242
dat_gender_tidier %>%
  group_by(gender) %>%
  summarize(mean_age = round_half_up(mean(age, na.rm = TRUE), digits = 2),
            sd_age = round_half_up(sd(age, na.rm = TRUE), digits = 2))
gender mean_age sd_age
female 32.30 8.40
male 30.89 7.69
nonbinary 29.56 7.94
NA 30.39 7.23
dat <- dat_gender_tidier %>%
  group_by(gender)

dat  %>%
  summarize(mean_age = round_half_up(mean(age, na.rm = TRUE), digits = 2),
            sd_age = round_half_up(sd(age, na.rm = TRUE), digits = 2))
gender mean_age sd_age
female 32.30 8.40
male 30.89 7.69
nonbinary 29.56 7.94
NA 30.39 7.23

8.1.2 ungroup()

Grouping is an invisible property of a data frame or tibble. You can’t find a value inside it that tells you it’s grouped, but the grouping can persist unless you ungroup(). This can cause unexpected behavior, so watch out for it.

8.1.2.1 group_by() applied to filter and slice functions

TODO

8.1.3 n()

n() calculates the number of rows, i.e., the N. It can be useful in summarize.

# summarize n
dat_gender_tidier %>%
  summarize(n_age = n())
n_age
500
# summarize n per gender group
dat_gender_tidier %>%
  group_by(gender) %>%
  summarize(n_age = n())
gender n_age
female 323
male 131
nonbinary 16
NA 30
# dat_gender_tidier %>%
#   summarize(n_age = n())
# 
# dat_gender_tidier %>%
#   n()
# 
# dat_gender_tidier %>%
#   mutate(n_age = n())

8.1.4 Realise that count() is just a wrapper function for summarize()

count() is just the combination of group_by() and summarize() and n(). They produce the same results as above.

# summarize n
dat_gender_tidier %>%
  count()
n
500
# summarize n per gender group
dat_gender_tidier %>%
  count(gender)
gender n
female 323
male 131
nonbinary 16
NA 30

8.2 distinct()

Distinct is a variation on count() that doesn’t return the counts, just the distinct values.

# summarize n per gender group
dat_gender_tidier %>%
  count(gender)
gender n
female 323
male 131
nonbinary 16
NA 30
# summarize n per gender group
dat_gender_tidier %>%
  distinct(gender)
gender
female
male
NA
nonbinary

8.2.1 Handling duplicates with multiple-counts() and distinct()

It can be used to remove duplicates, but it’s important to remember that you don’t know which row is retained vs dropped.

# create duplicates that need cleaning
dat_gender_tidier_with_duplicates <- dat_gender_tidier %>%
  dirt_duplicates(prop = 0.10)
  
# count rows
dat_gender_tidier_with_duplicates %>%
  count()
n
550
# unique participant ids
dat_gender_tidier_with_duplicates %>%
  count(id, name = "n_rows")
id n_rows
1 1
2 1
3 1
4 2
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 1
15 1
16 2
17 1
18 2
19 1
20 1
21 1
22 1
23 1
24 1
25 1
26 1
27 1
28 1
29 1
30 1
31 1
32 1
33 1
34 1
35 1
36 1
37 1
38 1
39 1
40 1
41 1
42 2
43 2
44 1
45 1
46 2
47 1
48 1
49 1
50 1
51 1
52 1
53 1
54 1
55 1
56 2
57 1
58 1
59 1
60 1
61 1
62 1
63 1
64 1
65 1
66 1
67 2
68 1
69 1
70 1
71 1
72 1
73 1
74 1
75 1
76 1
77 1
78 1
79 1
80 2
81 2
82 1
83 1
84 1
85 1
86 1
87 1
88 1
89 1
90 1
91 1
92 1
93 1
94 1
95 1
96 1
97 1
98 1
99 1
100 1
101 1
102 1
103 1
104 1
105 1
106 1
107 1
108 1
109 1
110 1
111 1
112 1
113 1
114 1
115 1
116 1
117 1
118 1
119 1
120 1
121 1
122 1
123 1
124 1
125 2
126 2
127 2
128 1
129 1
130 1
131 1
132 2
133 1
134 1
135 1
136 1
137 1
138 1
139 2
140 1
141 1
142 1
143 1
144 1
145 2
146 1
147 1
148 1
149 1
150 1
151 1
152 1
153 1
154 2
155 1
156 1
157 1
158 1
159 1
160 1
161 1
162 1
163 1
164 1
165 1
166 1
167 2
168 1
169 1
170 1
171 1
172 1
173 1
174 2
175 1
176 1
177 1
178 1
179 1
180 1
181 1
182 1
183 1
184 1
185 1
186 1
187 1
188 1
189 1
190 1
191 2
192 1
193 1
194 1
195 1
196 1
197 1
198 1
199 1
200 1
201 1
202 1
203 1
204 1
205 1
206 1
207 1
208 1
209 1
210 1
211 1
212 1
213 1
214 1
215 2
216 1
217 1
218 1
219 1
220 1
221 1
222 1
223 1
224 1
225 1
226 1
227 1
228 1
229 2
230 1
231 1
232 2
233 1
234 1
235 1
236 1
237 1
238 1
239 1
240 1
241 1
242 2
243 1
244 1
245 1
246 1
247 1
248 1
249 1
250 1
251 1
252 1
253 1
254 1
255 1
256 1
257 1
258 1
259 3
260 1
261 1
262 1
263 1
264 1
265 1
266 1
267 1
268 1
269 1
270 1
271 1
272 1
273 1
274 1
275 1
276 1
277 1
278 1
279 1
280 2
281 1
282 1
283 1
284 1
285 1
286 1
287 1
288 1
289 2
290 1
291 1
292 1
293 2
294 1
295 1
296 1
297 2
298 1
299 1
300 1
301 1
302 1
303 1
304 1
305 1
306 1
307 1
308 1
309 1
310 1
311 1
312 1
313 1
314 1
315 1
316 1
317 1
318 1
319 1
320 1
321 1
322 1
323 1
324 1
325 1
326 2
327 1
328 1
329 1
330 1
331 1
332 1
333 1
334 1
335 1
336 1
337 2
338 1
339 1
340 1
341 1
342 1
343 1
344 1
345 1
346 1
347 1
348 1
349 1
350 1
351 1
352 1
353 1
354 1
355 1
356 1
357 1
358 1
359 1
360 1
361 1
362 1
363 2
364 1
365 1
366 2
367 1
368 1
369 2
370 1
371 1
372 1
373 1
374 1
375 1
376 1
377 2
378 1
379 1
380 1
381 2
382 1
383 1
384 1
385 1
386 1
387 1
388 1
389 1
390 1
391 2
392 1
393 2
394 1
395 1
396 1
397 1
398 1
399 2
400 1
401 1
402 1
403 1
404 1
405 2
406 1
407 1
408 1
409 1
410 1
411 1
412 1
413 1
414 2
415 1
416 1
417 1
418 1
419 1
420 1
421 1
422 1
423 1
424 1
425 1
426 1
427 1
428 1
429 1
430 1
431 1
432 1
433 1
434 1
435 1
436 1
437 1
438 1
439 1
440 1
441 1
442 2
443 1
444 1
445 2
446 1
447 2
448 1
449 1
450 1
451 1
452 1
453 1
454 1
455 1
456 1
457 1
458 2
459 1
460 1
461 1
462 2
463 1
464 1
465 1
466 1
467 1
468 1
469 1
470 1
471 1
472 1
473 1
474 1
475 1
476 1
477 1
478 1
479 1
480 2
481 1
482 1
483 1
484 1
485 1
486 1
487 1
488 1
489 1
490 1
491 1
492 1
493 1
494 1
495 1
496 3
497 1
498 1
499 1
500 1
# count unique participant ids and the number of rows they have
dat_gender_tidier_with_duplicates %>%
  count(id, name = "n_rows") %>%
  count(n_rows, name = "n_ids_with_n_rows")
n_rows n_ids_with_n_rows
1 452
2 46
3 2
# count unique participant ids and the number of rows they have
dat_gender_tidier_duplicates_removed <- dat_gender_tidier_with_duplicates %>%
  distinct(id, .keep_all = TRUE)  # keep all other columns

dat_gender_tidier_duplicates_removed %>% 
  count(id, name = "n_rows") %>%
  count(n_rows, name = "n_ids_with_n_rows")
n_rows n_ids_with_n_rows
1 500
# alternative, fewer assumptions
dat_gender_tidier_duplicates_removed2 <- dat_gender_tidier_with_duplicates %>%
  distinct(id, age, gender)

dat_gender_tidier_duplicates_removed2 %>% 
  count(id, name = "n_rows") %>%
  count(n_rows, name = "n_ids_with_n_rows")
n_rows n_ids_with_n_rows
1 500

8.2.2 More complex summarizations

Like mutate, the operation you do to summarize can also be more complex, such as finding the mean result of a logical test to calculate a proportion. For example, the proportion of participants who are less than 25 years old:

dat_gender_tidier %>%
  summarize(proportion_less_than_25 = mean(age < 25, na.rm = TRUE)) %>%
  mutate(percent_less_than_25 = round_half_up(proportion_less_than_25 * 100, 1))
proportion_less_than_25 percent_less_than_25
0.2367865 23.7
dat_gender_tidier %>%
  mutate(less_than_25 = if_else(age < 25, 1, 0)) %>%
  summarize(percent_below_25 = mean(less_than_25, na.rm = TRUE)*100)
percent_below_25
23.67865
dat_gender_tidier %>%
  mutate(less_than_25 = age < 25)
id age gender less_than_25
1 39 female FALSE
2 35 female FALSE
3 37 female FALSE
4 44 female FALSE
5 28 female FALSE
6 NA male NA
7 27 female FALSE
8 22 male TRUE
9 30 male FALSE
10 38 male FALSE
11 27 female FALSE
12 NA female NA
13 34 male FALSE
14 43 female FALSE
15 40 female FALSE
16 20 female TRUE
17 31 NA FALSE
18 27 male FALSE
19 41 female FALSE
20 NA male NA
21 45 female FALSE
22 19 NA TRUE
23 39 male FALSE
24 NA female NA
25 43 female FALSE
26 NA female NA
27 22 female TRUE
28 41 male FALSE
29 29 female FALSE
30 32 female FALSE
31 43 female FALSE
32 23 nonbinary TRUE
33 18 female TRUE
34 45 nonbinary FALSE
35 40 female FALSE
36 30 male FALSE
37 26 female FALSE
38 31 female FALSE
39 29 female FALSE
40 35 male FALSE
41 24 female TRUE
42 33 female FALSE
43 29 female FALSE
44 29 female FALSE
45 23 female TRUE
46 36 female FALSE
47 35 female FALSE
48 19 female TRUE
49 25 female FALSE
50 41 female FALSE
51 38 male FALSE
52 18 female TRUE
53 42 female FALSE
54 32 male FALSE
55 45 female FALSE
56 35 male FALSE
57 31 female FALSE
58 42 male FALSE
59 18 female TRUE
60 23 female TRUE
61 36 female FALSE
62 21 male TRUE
63 38 female FALSE
64 18 female TRUE
65 NA NA NA
66 18 male TRUE
67 18 female TRUE
68 NA female NA
69 32 female FALSE
70 43 female FALSE
71 44 nonbinary FALSE
72 42 female FALSE
73 35 female FALSE
74 38 male FALSE
75 44 female FALSE
76 28 female FALSE
77 18 female TRUE
78 27 female FALSE
79 37 female FALSE
80 26 female FALSE
81 NA male NA
82 22 female TRUE
83 41 female FALSE
84 23 female TRUE
85 28 female FALSE
86 NA female NA
87 22 female TRUE
88 36 female FALSE
89 23 male TRUE
90 25 female FALSE
91 33 male FALSE
92 41 female FALSE
93 34 female FALSE
94 29 female FALSE
95 27 male FALSE
96 25 female FALSE
97 41 female FALSE
98 45 female FALSE
99 NA male NA
100 25 female FALSE
101 38 female FALSE
102 42 female FALSE
103 24 female TRUE
104 31 male FALSE
105 32 male FALSE
106 32 female FALSE
107 28 female FALSE
108 27 male FALSE
109 36 male FALSE
110 NA female NA
111 35 NA FALSE
112 32 male FALSE
113 32 female FALSE
114 36 female FALSE
115 30 male FALSE
116 37 female FALSE
117 40 male FALSE
118 40 female FALSE
119 42 female FALSE
120 20 female TRUE
121 45 female FALSE
122 27 female FALSE
123 33 female FALSE
124 38 female FALSE
125 30 female FALSE
126 44 male FALSE
127 35 male FALSE
128 18 female TRUE
129 43 female FALSE
130 26 female FALSE
131 NA female NA
132 28 NA FALSE
133 22 female TRUE
134 31 male FALSE
135 44 female FALSE
136 20 female TRUE
137 22 female TRUE
138 45 male FALSE
139 28 male FALSE
140 31 female FALSE
141 33 female FALSE
142 22 male TRUE
143 25 female FALSE
144 23 female TRUE
145 38 male FALSE
146 21 female TRUE
147 22 female TRUE
148 41 female FALSE
149 43 female FALSE
150 29 female FALSE
151 35 NA FALSE
152 24 male TRUE
153 NA female NA
154 45 male FALSE
155 NA male NA
156 30 male FALSE
157 27 female FALSE
158 42 nonbinary FALSE
159 33 male FALSE
160 21 male TRUE
161 19 female TRUE
162 29 female FALSE
163 30 female FALSE
164 18 female TRUE
165 45 female FALSE
166 NA female NA
167 32 female FALSE
168 37 male FALSE
169 29 nonbinary FALSE
170 32 female FALSE
171 33 NA FALSE
172 34 female FALSE
173 34 female FALSE
174 18 male TRUE
175 34 NA FALSE
176 27 female FALSE
177 20 male TRUE
178 34 female FALSE
179 21 male TRUE
180 42 male FALSE
181 28 nonbinary FALSE
182 24 nonbinary TRUE
183 27 female FALSE
184 33 male FALSE
185 24 nonbinary TRUE
186 20 male TRUE
187 45 female FALSE
188 35 female FALSE
189 38 female FALSE
190 NA female NA
191 43 female FALSE
192 28 nonbinary FALSE
193 22 male TRUE
194 33 male FALSE
195 30 female FALSE
196 21 male TRUE
197 36 female FALSE
198 41 male FALSE
199 32 male FALSE
200 25 male FALSE
201 20 female TRUE
202 40 male FALSE
203 26 female FALSE
204 40 female FALSE
205 30 female FALSE
206 22 female TRUE
207 29 female FALSE
208 21 female TRUE
209 42 female FALSE
210 27 female FALSE
211 43 male FALSE
212 20 female TRUE
213 24 male TRUE
214 23 NA TRUE
215 34 female FALSE
216 44 female FALSE
217 43 female FALSE
218 37 female FALSE
219 NA female NA
220 27 NA FALSE
221 37 female FALSE
222 29 female FALSE
223 37 female FALSE
224 45 male FALSE
225 20 female TRUE
226 35 female FALSE
227 NA male NA
228 20 male TRUE
229 32 female FALSE
230 29 male FALSE
231 32 male FALSE
232 20 female TRUE
233 37 male FALSE
234 40 female FALSE
235 23 female TRUE
236 27 female FALSE
237 43 female FALSE
238 23 female TRUE
239 28 female FALSE
240 18 female TRUE
241 27 female FALSE
242 43 male FALSE
243 20 female TRUE
244 30 female FALSE
245 22 female TRUE
246 42 female FALSE
247 27 female FALSE
248 33 female FALSE
249 31 female FALSE
250 44 female FALSE
251 44 female FALSE
252 18 female TRUE
253 18 female TRUE
254 28 male FALSE
255 31 male FALSE
256 18 male TRUE
257 37 male FALSE
258 20 NA TRUE
259 40 male FALSE
260 20 female TRUE
261 40 female FALSE
262 44 male FALSE
263 34 NA FALSE
264 36 female FALSE
265 36 female FALSE
266 40 male FALSE
267 39 female FALSE
268 31 female FALSE
269 18 female TRUE
270 23 male TRUE
271 23 female TRUE
272 31 male FALSE
273 34 female FALSE
274 37 female FALSE
275 37 female FALSE
276 25 male FALSE
277 26 female FALSE
278 21 nonbinary TRUE
279 24 female TRUE
280 21 female TRUE
281 32 male FALSE
282 43 male FALSE
283 24 male TRUE
284 32 female FALSE
285 33 female FALSE
286 44 female FALSE
287 30 male FALSE
288 34 male FALSE
289 32 female FALSE
290 NA male NA
291 18 female TRUE
292 43 female FALSE
293 40 female FALSE
294 36 female FALSE
295 25 male FALSE
296 33 male FALSE
297 31 female FALSE
298 42 female FALSE
299 28 NA FALSE
300 27 male FALSE
301 32 female FALSE
302 44 female FALSE
303 45 female FALSE
304 35 nonbinary FALSE
305 19 nonbinary TRUE
306 43 female FALSE
307 25 female FALSE
308 45 female FALSE
309 33 female FALSE
310 40 male FALSE
311 40 female FALSE
312 22 female TRUE
313 41 female FALSE
314 31 male FALSE
315 24 male TRUE
316 22 female TRUE
317 45 female FALSE
318 29 female FALSE
319 23 male TRUE
320 38 NA FALSE
321 34 female FALSE
322 38 female FALSE
323 33 female FALSE
324 40 female FALSE
325 22 female TRUE
326 23 NA TRUE
327 45 female FALSE
328 22 female TRUE
329 NA female NA
330 18 male TRUE
331 39 male FALSE
332 NA female NA
333 37 female FALSE
334 38 female FALSE
335 35 NA FALSE
336 38 female FALSE
337 36 male FALSE
338 18 female TRUE
339 23 NA TRUE
340 28 female FALSE
341 29 female FALSE
342 20 male TRUE
343 32 male FALSE
344 31 male FALSE
345 18 female TRUE
346 41 male FALSE
347 34 female FALSE
348 20 female TRUE
349 18 female TRUE
350 30 female FALSE
351 19 female TRUE
352 45 male FALSE
353 30 male FALSE
354 28 male FALSE
355 23 female TRUE
356 40 female FALSE
357 23 female TRUE
358 39 female FALSE
359 24 female TRUE
360 29 female FALSE
361 44 female FALSE
362 39 female FALSE
363 18 male TRUE
364 36 NA FALSE
365 30 male FALSE
366 20 male TRUE
367 42 male FALSE
368 42 female FALSE
369 25 female FALSE
370 39 female FALSE
371 42 female FALSE
372 40 male FALSE
373 32 female FALSE
374 22 male TRUE
375 28 female FALSE
376 30 female FALSE
377 32 male FALSE
378 31 female FALSE
379 41 female FALSE
380 36 female FALSE
381 20 female TRUE
382 45 male FALSE
383 18 female TRUE
384 30 male FALSE
385 26 male FALSE
386 34 female FALSE
387 43 female FALSE
388 33 female FALSE
389 42 female FALSE
390 34 female FALSE
391 24 NA TRUE
392 NA female NA
393 29 nonbinary FALSE
394 28 female FALSE
395 29 female FALSE
396 28 female FALSE
397 22 male TRUE
398 30 female FALSE
399 44 female FALSE
400 38 female FALSE
401 45 female FALSE
402 38 female FALSE
403 38 female FALSE
404 22 female TRUE
405 41 female FALSE
406 36 female FALSE
407 43 female FALSE
408 44 female FALSE
409 44 female FALSE
410 27 nonbinary FALSE
411 32 male FALSE
412 19 NA TRUE
413 30 female FALSE
414 44 female FALSE
415 29 female FALSE
416 21 female TRUE
417 45 female FALSE
418 24 female TRUE
419 45 female FALSE
420 30 male FALSE
421 NA female NA
422 36 NA FALSE
423 NA female NA
424 30 female FALSE
425 43 female FALSE
426 33 female FALSE
427 27 male FALSE
428 18 NA TRUE
429 33 NA FALSE
430 26 female FALSE
431 20 male TRUE
432 41 NA FALSE
433 18 female TRUE
434 29 female FALSE
435 42 female FALSE
436 24 female TRUE
437 NA female NA
438 32 female FALSE
439 26 female FALSE
440 19 male TRUE
441 44 female FALSE
442 31 NA FALSE
443 NA female NA
444 38 female FALSE
445 29 nonbinary FALSE
446 NA NA NA
447 35 female FALSE
448 29 female FALSE
449 39 male FALSE
450 45 female FALSE
451 28 female FALSE
452 42 female FALSE
453 26 male FALSE
454 20 male TRUE
455 41 female FALSE
456 45 female FALSE
457 43 NA FALSE
458 26 NA FALSE
459 18 female TRUE
460 31 male FALSE
461 44 female FALSE
462 36 male FALSE
463 29 male FALSE
464 41 male FALSE
465 39 female FALSE
466 31 female FALSE
467 29 male FALSE
468 37 female FALSE
469 33 female FALSE
470 40 female FALSE
471 39 NA FALSE
472 21 male TRUE
473 30 female FALSE
474 22 female TRUE
475 21 female TRUE
476 37 female FALSE
477 18 male TRUE
478 30 female FALSE
479 26 male FALSE
480 26 female FALSE
481 39 NA FALSE
482 30 male FALSE
483 30 female FALSE
484 44 female FALSE
485 31 male FALSE
486 26 nonbinary FALSE
487 19 female TRUE
488 44 female FALSE
489 26 female FALSE
490 45 female FALSE
491 41 female FALSE
492 32 female FALSE
493 29 male FALSE
494 35 female FALSE
495 29 male FALSE
496 43 female FALSE
497 32 male FALSE
498 33 female FALSE
499 40 female FALSE
500 29 female FALSE
vec <- c(TRUE, FALSE, FALSE, FALSE, FALSE)
vec <- c(1, 0, 0, 0, 0)

mean(vec)*100
[1] 20

8.2.3 Practice using summarize()

8.2.3.1 1

Calculate the N, min, max, mean, and SD of age in dat_gender_tidier.

bind_rows(
dat_gender_tidier %>%
  drop_na(age) %>%
  summarize(N = n(),
            min = min(age),
            max = max(age),
            mean = mean(age),
            sd = sd(age)) %>%
  round_half_up(2),

dat_gender_tidier %>%
  summarize(N = n(),
            min = min(age, na.rm = TRUE),
            max = max(age, na.rm = TRUE),
            mean = mean(age, na.rm = TRUE),
            sd = sd(age, na.rm = TRUE)) %>%
  round_half_up(2)
)
N min max mean sd
473 18 45 31.72 8.16
500 18 45 31.72 8.16

8.2.3.2 2 TODO

Currently each participant has up to three responses on the self-report scales (three item scale: like, positive, and prefer). Create a new dataframe containing each unique_id’s mean score across the items. Also calculate how many items each participant has data for, and whether they have complete data (i.e., data for three items).

8.2.3.3 3 TODO

Using only participants with complete, calculate the mean and SD of all participant’s mean scores on the self-reports.

8.2.3.4 4 TODO

Create a new data frame that calculates the proportion of prime-congruent trials for each participant on the AMP (i.e., the mean of the ‘correct’ column), their proportion of too-fast trials, and their number of trials.

Also add to that data frame a new column called “exclude_amp” and set it to “exclude” if more than 10% of a participant’s trials are too-fast trials and “include” if not.

8.2.3.5 5 TODO

Calculate the proportion of participants who are to be excluded.

# data_amp_scored %>%

8.3 Frequncies of sets of columns

Note that it is also possible to use count to obtain the frequencies of sets of unique values across columns, e.g., unique combinations of item and response.

# data_demographics_trimmed %>%
#   count(item)
# 
# data_demographics_trimmed %>%
#   count(response)
# 
# data_demographics_trimmed %>%
#   count(item, response)

It can be useful to arrange the output by the frequencies.

# data_demographics_trimmed %>%
#   count(item, response) %>%
#   arrange(desc(n)) # arrange in descending order

8.3.1 summarize(across())

TODO here or later in

8.4 Check your learning

What is the difference between mutate() and summarize()? If I use the wrong one, will I get the same answer? E.g., mutate(mean_age = mean(age, na.rm = TRUE)) vs. summarize(mean_age = mean(age, na.rm = TRUE))