2 R
For this thesis, I utilized the programming language R. To me, R has three key benefits: cost, extensibility, and a welcoming community. In this chapter, I briefly describe the history of R, touch on its rising popularity, and provide context for the published software discussed in the main thesis chapters.
2.1 What is R?
R is a direct descendant of the statistical environment S, originally developed in 1976 by John Chambers and colleagues at Bell Telephone Laboratories – part of AT&T – for internal use (Roger D. Peng 2020). S was originally implemented in Fortran before being rewritten in C in 1988 (Roger D. Peng 2020). S, like many popular statistical environments such as SPSS or Stata, was only available with a commercial license. In the early 1990s, Ross Ihaka and Robert Gentleman from the University of Auckland developed R as an interpreted, object-oriented language, that combines the strengths of S and the language Scheme (Ihaka and Gentleman 1996).
Technically, R is an interpreted, object-oriented language. Its syntax is similar to S, but its implementation mimics Scheme, and is primarily implemented in C, C++, and Fortran, which lends it computational efficiency (Ihaka and Gentleman 1996; The R Foundation, n.d.). At its core, R is “an integrated suite of software facilities for data manipulation, calculation and graphical display” (The R Foundation, n.d.). I believe it is much more. Because of its extensibility, discussed in the next section, I believe it should be classified as a general programming language. It has, after all, gained widespread popularity in both academia and industry (Carson and Basiliko 2016; Ashlee Vance 2009; RStudio 2018a).
2.2 Why is R becoming more popular?
One reason for R’s rising popularity is its easy extensibility with packages, the fundamental unit of shareable code (Hadley Wickham and Jenny Bryan, n.d.). Packages bundle code, data, documentation, and tests in a common manner which allows all R users to benefit from others’ work (Hadley Wickham and Jenny Bryan, n.d.). Packages can be shared internally (e.g. within a research lab) or publicly. The most common way to publicly share your code is through the Comprehensive R Archive Network (CRAN) (“The Comprehensive R Archive Network,” n.d.). CRAN provides discoverability, facilitates easy installation, and lends a sense of credibility (each package must pass certain standards) (Hadley Wickham and Jenny Bryan, n.d.). At the time of writing,1 CRAN has 17,313 available packages (three of which are discussed in this thesis). Packages exist for everything from implementing hierarchical Bayesian models (Bürkner et al. 2021) to image manipulation (Ooms [aut and cre 2021) and console-based games (Xie [aut et al. 2020).
R is open-source under the Free Software Foundation’s GNU General Public License. Because it is free and has an actively maintained package ecosystem, R is firmly situated as a top choice of programming languages.