10  Data types

10.1 Data types

  • integers, numeric/floats, factors, strings, booleans/logicals, NA.

10.2 Working with numeric/floats

floating point math is weird…

near()

10.2.1 near()

testing equivalence with floats is weird

sqrt(2) ^ 2 == 2 near(sqrt(2) ^ 2, 2)

2
[1] 2
sqrt(2)^2
[1] 2
2 == sqrt(2)^2
[1] FALSE

10.2.2 between()

TODO

10.2.3 Rounding: round() probably doesn’t do what you think

It is extremely common to round statistical results before including them in text and tables.

However, did you know that R doesn’t use the rounding method most of us are taught in school where .5 is rounded up to the next integer? Instead it uses “banker’s rounding”, which is better when you round a very large number of numbers, but worse for reporting the results of specific analyses.

This is easier to show than explain. The round() function rounds each of the numbers passed to it. What do you expect the output to be?

round(c(0.5, 
        1.5, 
        2.5, 
        3.5, 
        4.5, 
        5.5), digits = 0)
round(c(0.5, 
        1.5, 
        2.5, 
        3.5, 
        4.5, 
        5.5))
[1] 0 2 2 4 4 6

Why is this? Because R’s round() function uses “banker’s rounding, which rounds 5s based on whether the preceding digit is odd or even. This is a good thing in many contexts like accounting, but it’s usually not what we want or expect when rounding specific statistical results for inclusion in a report or manuscript.

In most of your R scripts, you should instead use the {roundwork} package’s round_up(), written by Lukas Jung, which produces the round-.5-upwards behavior most of us expect.

library(roundwork) 

roundwork::round_up(c(0.5, 
                      1.5, 
                      2.5, 
                      3.5, 
                      4.5, 
                      5.5))
[1] 1 2 3 4 5 6

These will typically be used inside a pipe workflow:

dat_regression_betas_rounded <- dat_regression_betas %>%
  mutate(beta_estimate = round_up(beta_estimate, 2),
         beta_ci_lower = round_up(beta_ci_lower, 2),
         beta_ci_upper = round_up(beta_ci_upper, 2)) 

dat_regression_betas_rounded %>%
  kable() %>%
  kable_classic(full_width = FALSE)
beta_estimate beta_ci_lower beta_ci_upper p
0.37 0.17 0.57 0.0009180
0.30 0.10 0.50 0.0000014
0.12 -0.08 0.32 0.0082030
0.29 0.09 0.49 0.0014797
0.18 -0.02 0.38 0.0043528

10.2.4 ‘rounding’ of p-values using APA style

The one thing that psychologists don’t round using the round-half-up rule is p-values. These are instead usually truncated using the APA style guide’s conventions so that p values smaller than .001 are reported as “< .001”.

# install.packages("devtools"); devtools::install_github("ianhussey/truffle")
library(truffle)

dat_regression_betas_rounded <- dat_regression_betas %>%
  mutate(beta_estimate = round_up(beta_estimate, 2),
         beta_ci_lower = round_up(beta_ci_lower, 2),
         beta_ci_upper = round_up(beta_ci_upper, 2),
         p = round_p_value(p)) 

dat_regression_betas_rounded %>%
  kable(align = 'r') %>%
  kable_classic(full_width = FALSE)
beta_estimate beta_ci_lower beta_ci_upper p
0.37 0.17 0.57 < .001
0.30 0.10 0.50 < .001
0.12 -0.08 0.32 .008
0.29 0.09 0.49 .001
0.18 -0.02 0.38 .004

10.3 Working with strings

10.3.1 Case conversion

str_to_lower str_to_upper str_to_sentence str_to_title

10.3.2 Substring searches

str_detect() + ignore case str_starts str_ends

str_locate() str_locate_all()

10.3.3 Removal

str_remove() str_remove_all()

str_squish #remove whitespace

10.3.4 Replacement

str_replace str_replace_all

10.3.5 Seperation

str_split, relationship with seperate()

10.3.6 Extraction

word()

10.3.7 Regex

TODO

10.4 Working with factors

forcats with plot examples, regression examples

converting numeric to factor via character or whatever that weird thing is

fct_rev

fct_order - contrast with arrange() fct_relevel

fct_drop

fct_lump