Hadley Wickham announced at Twitter that RStudio now provides CRAN package download logs. I was wondering about the download numbers of my package and wrote some code to extract that information from the logs…
The first code snippet is taken from the log website itself:
# Here's an easy way to get all the URLs in R start <- as.Date('2013-11-28') today <- as.Date('2015-03-04') all_days <- seq(start, today, by = 'day') year <- as.POSIXlt(all_days)$year + 1900 urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz')
Then I downloaded all files into a folder:
for (i in 1:length(urls)) { download.file(urls[i], sprintf("~/Desktop/rstats/temp%i.csv.gz", i)) }
Unzipping did not work with unzip
, so I just “opened” all files with the OS X unarchiver, which was quite convenient.
Than I read all csv-files and extracted the information for my package, sjPlot, from each csv-file and merged everything into one data frame:
sjPlot.df <- data.frame() library(dplyr) pb <- txtProgressBar(min=0, max=length(urls), style=3) for (i in 1:length(urls)) { df.csv <- read.csv(sprintf("~/Desktop/rstats/temp%i.csv", i)) pack <- tolower(as.character(df.csv$package)) my.package <- which(pack == "sjplot") if (length(my.package) > 0 ) { dummy.df <- df.csv %>% dplyr::slice(my.package) %>% dplyr::select(date, package, version, country) sjPlot.df <- dplyr::bind_rows(sjPlot.df, dummy.df) } setTxtProgressBar(pb, i) } close(pb) sjPlot.df$date.short <- strftime(sjPlot.df$date, format="%Y-%m")
Finally, the download-stats as plot:
library(sjPlot) library(ggplot2) mydf <- sjPlot.df %>% dplyr::count(date.short) sjp.setTheme(theme = "539", axis.angle.x = 90) ggplot(mydf, aes(x = date.short, y = n)) + geom_bar(stat = "identity", width = .5, alpha = .5, fill = "#3399cc") + scale_y_continuous(expand = c(0, 0), breaks = seq(250, 1500, 250)) + labs(x = sprintf("Monthly CRAN-downloads of sjPlot package since first release until 4th March (total download: %i)", sum(mydf$n)), y = NULL)
By the way, there’s already a shiny app for this…
Tagged: data visualization, ggplot, R, rstats, sjPlot
