--- title: "Working with multiple files" author: "Tristan Mahr" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Working with multiple files} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo = FALSE, message = FALSE} library("rprime") library("knitr") opts_chunk$set( comment = "#>", error = FALSE, tidy = FALSE, collapse = TRUE) ``` Suppose we want to load all the Eprime files in a directory and combine the results in dataframe. My strategy in this scenario is to figure out what I need to do for a single file and then wrap those steps in a function that takes a filepath to a txt file and returns a dataframe. After some exploration and interactive programming, I come up with the following function. ```{r} library("plyr") reduce_sails <- function(sails_path) { sails_lines <- read_eprime(sails_path) sails_frames <- FrameList(sails_lines) # Trials occur at level 3 sails_frames <- keep_levels(sails_frames, 3) sails <- to_data_frame(sails_frames) # Tidy up to_pick <- c("Eprime.Basename", "Running", "Module", "Sound", "Sample", "Correct", "Response") sails <- sails[to_pick] running_map <- c(TrialLists = "Trial", PracticeBlock = "Practice") sails$Running <- running_map[sails$Running] # Renumber trials in the practice and experimental blocks separately. # Numerically code correct response. sails <- ddply(sails, .(Running), mutate, TrialNumber = seq(from = 1, to = length(Running)), CorrectResponse = ifelse(Correct == Response, 1, 0)) sails$Sample <- NULL # Optionally, one might save the processed file via: # csv <- paste0(file_path_sans_ext(sails_path), ".csv") # write.csv(sails, csv, row.names = FALSE) sails } ``` Here's a preview of what the function returns when given a filepath. ```{r} head(reduce_sails("data/SAILS/SAILS_001X00XS1.txt")) ``` Now that the function works on one file, I can use `ldply` to apply the function to several files, returning results in a single dataframe. (For `dplyr`, I would `lapply` the function to each path to get a list of dataframes, then use `bind_rows` to combine into a single dataframe.) ```{r} sails_paths <- list.files("data/SAILS/", pattern = ".txt", full.names = TRUE) sails_paths ensemble <- ldply(sails_paths, reduce_sails) ``` Finally, with all of the subjects' data contained in a single dataframe, I can use `ddply` plus `summarise` and compute summary scores at different levels of aggregation within each subject. ```{r} # Score trials within subjects overall <- ddply(ensemble, .(Eprime.Basename, Running), summarise, Score = sum(CorrectResponse), PropCorrect = Score / length(CorrectResponse)) overall # Score modules within subjects modules <- ddply(ensemble, .(Eprime.Basename, Running, Module), summarise, Score = sum(CorrectResponse), PropCorrect = mean(CorrectResponse)) modules ```