BSOL R Guide
2024-09-30
Chapter 1 Loading data from file
1.1 Read from CSV
read_csv() is found in the readr package (part of tidyverse) and is an improved version of the base R function read.csv().
csv_data <- read_csv(
"data.csv",
col_types = cols( # specify data types
col1 = col_character(),
col2 = col_double(),
col3 = col_date(),
col4 = col_datetime(),
col5 = col_time(),
col6 = col_logical()
)
)
Use the col_types argument to specify data types. See the documentation for cols() to see the possible types.
If the col_types argument is left blank, read_csv estimates an appropriate data type for each column using the first 1000 rows of data. Amend the guess_max argument to adjust the number of rows used to estimate type.
Use col_types = cols() to suppress the output message to the console.
Use cols_only instead of cols() to only specify the data types of a subset of the columns.
1.2 Read from excel
readxl is downloaded as part of the tidyverse packages but needs to be called specifically to load in its functions.
By default, readxl will load in the first sheet of the workbook.
To read in xls files, use either read_excel() or read_xls().
excel_data <- read_xlsx(
"data.xlsx",
sheet = "Sheet1",
range = "A1:D20"
)
excel_data <- read_xlsx(
"data.xlsx",
sheet = "Sheet1",
skip = 3,
n_max = 100
)
Use the sheet, range, skip, n_max etc arguments to specify the range of data to read.
excel_data <- read_xlsx(
"data.xlsx",
sheet = "Sheet1",
range = "A1:D20",
col_types = c("text","numeric","date","guess")
)
Use the col_types argument to specify data types. See the documentation for list of all possible types.