Chapter 5 String functions
5.1 Stringr
The stringr package contains multiple functions to manipulate strings. Stringr is part of the tidyverse collection of packages, and therefore is designed to work with the pipe function.
Base R also has a range of functions that allow string manipulation, however the base R functions do not tend to operate well with the pipe and therefore stringr functions are usually preferable.
The below dataset of six UK food store locations will be used in the following examples.
5.2 Example functions
locations %>%
mutate(pc_no_spc = str_replace(postcode, " ", "")) %>%
mutate(pc_length = str_length(pc_no_spc)) %>%
mutate(pc_district = case_when(
pc_length == 5 ~ str_sub(postcode, 1, 2),
pc_length == 6 ~ str_sub(postcode, 1, 3),
pc_length == 7 ~ str_sub(postcode, 1, 4),
)) %>%
mutate(location_pc = paste(location, "-", pc_district))
The str_replace function replaces matched characters within the string.
The str_length function calculates the length of the string.
The str_sub function identifies a substring. Equivalent of the LEFT, MID, RIGHT functions in sql etc.
The paste functions are part of base R (not stringr) but are useful for concatenation. By default the paste function includes spaces between values. Use the sep argument to change this to a different character, or conversely, if no spaces are required, use the paste0 function.
To learn more about other available stringr functions, a useful “cheat sheet” can be found here: