Using dplyr and pipes to rename variables
Jun 14, 2016 · 2 minute readR
One of the primary things that slows me down in R is data management. In a typical analysis, I’ll import a raw csv file, perform some data management tasks (anonymizing survey responses, renaming variables, recoding, etc.), and export a new csv file with the cleaned data. I still struggle to do this as efficiently in R as I can in Stata.
Here’s dplyr
to the rescue when it comes to renaming variables. By using the pipe operator (%>%), I can quickly rename a bunch of variables in a way that is surprisingly readable. You can stack a bunch of dplyr commands in a row, too, knocking out a bunch of data management tasks at once.
Here’s a quick example. First, I’ll load dplyr and create a test dataframe. For grins, I’ll use Purdue and New Orleans Saints star Drew Brees’ NFL passing stats from 2006–2015, which I happen to have sitting on my hard drive:
library(dplyr)
breesus <- data.frame(
V1 = c(2006,2007,2008,2009,2010,2011,2012,2013,2014,2015),
V2 = c(356,440,413,363,448,468,422,446,456,428),
V3 = c(554,652,635,514,658,657,670,650,659,627),
V4 = c(26,28,34,34,33,46,43,39,33,32)
)
Now, use the pipe operator to rename all 4 variables into something more descriptive. Notice how readable and straightforward this is.
breesus <- breesus %>%
rename(year = V1) %>%
rename(completions = V2) %>%
rename(attempts = V3) %>%
rename(touchdowns = V4)
Check breesus to see if it worked:
names(breesus)
## [1] "year" "completions" "attempts" "touchdowns"
UPDATE 2017-03-23
Of course, you don’t need to pipe each of those separately…you can just use one pipe and commas to keep them readable while typing less:
breesus <- breesus %>%
rename(year = V1,
completions = V2,
attempts = V3,
touchdowns = V4
)
I try to get less dumb about R each year :)
And, for the fun of it:
library(ggplot2)
ggplot(breesus, aes(x = year, y = 100 * completions/attempts)) +
geom_line(linetype = "dashed") +
geom_point(aes(color = touchdowns), size = 3) +
ylab("Completion Percentage") +
xlab("Year") +
scale_x_continuous(breaks = c(2006,2007,2008,2009,2010,2011,2012,2013,2014,2015)) +
theme_minimal()