Quantcast
Channel: ggplot – Strenge Jacke!
Viewing all articles
Browse latest Browse all 35

CRAN download statistics of any packages #rstats

$
0
0

Hadley Wickham announced at Twitter that RStudio now provides CRAN package download logs. I was wondering about the download numbers of my package and wrote some code to extract that information from the logs…

The first code snippet is taken from the log website itself:

# Here's an easy way to get all the URLs in R
start <- as.Date('2013-11-28')
today <- as.Date('2015-03-04')

all_days <- seq(start, today, by = 'day')

year <- as.POSIXlt(all_days)$year + 1900
urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz')

Then I downloaded all files into a folder:

for (i in 1:length(urls)) {
  download.file(urls[i], sprintf("~/Desktop/rstats/temp%i.csv.gz", i))
}

Unzipping did not work with unzip, so I just “opened” all files with the OS X unarchiver, which was quite convenient.

Than I read all csv-files and extracted the information for my package, sjPlot, from each csv-file and merged everything into one data frame:

sjPlot.df <- data.frame()
library(dplyr)
pb <- txtProgressBar(min=0, max=length(urls), style=3)

for (i in 1:length(urls)) {
  df.csv <- read.csv(sprintf("~/Desktop/rstats/temp%i.csv", i))
  pack <- tolower(as.character(df.csv$package))
  my.package <- which(pack == "sjplot")
  if (length(my.package) > 0 ) {
    dummy.df <- df.csv %>% dplyr::slice(my.package) %>% dplyr::select(date, package, version, country)
    sjPlot.df <- dplyr::bind_rows(sjPlot.df, dummy.df)
  }
  setTxtProgressBar(pb, i)
}
close(pb)
sjPlot.df$date.short <- strftime(sjPlot.df$date, format="%Y-%m")

Finally, the download-stats as plot:

library(sjPlot)
library(ggplot2)

mydf <- sjPlot.df %>% dplyr::count(date.short)

sjp.setTheme(theme = "539", axis.angle.x = 90)
ggplot(mydf, aes(x = date.short, y = n)) +
  geom_bar(stat = "identity", width = .5, alpha = .5, fill = "#3399cc") +
  scale_y_continuous(expand = c(0, 0), breaks = seq(250, 1500, 250)) +
  labs(x = sprintf("Monthly CRAN-downloads of sjPlot package since first release until 4th March (total download: %i)", sum(mydf$n)), y = NULL)

sjPlot-downloads

By the way, there’s already a shiny app for this…


Tagged: data visualization, ggplot, R, rstats, sjPlot

Viewing all articles
Browse latest Browse all 35