5 {ghee}
5.1 Version Control
Programming is an important skill used in academia and industry alike. For example, an industry professional may need to develop code that combines demographic data and purchasing history in order to predict future profits for a new product. Similarly, a computational neuroscientist, may need to develop code that combines neuroimaging data like fMRI with behavioral data. Such a combination may help them understand the relationship between neural processes and political attitudes – e.g. (Ahn et al. 2014). Complex models and programs like these are often developed piecemeal by multiple individuals. Thus, if one person makes a change or implements a feature, it’s essential that all collaborators have the latest version. It is equally important that any change be revertible in case it introduces a bug to the existing code.
These requirements are met with a version control system (VCS), which allows people to track incremental changes to code. A VCS may be thought of as a more robust version of Microsoft Word’s “Track Changes” feature. A popular VCS is called git, and is popular with scientists and programmers alike (Blischak, Davenport, and Wilson 2016). The next section discusses the basics of working with git and how to share your code with others using GitHub, an online platform that hosts code that is version-controlled with git.
5.2 git and GitHub
In order to version-control your code with git you must navigate to the folder containing your code and initiate a repository.10
A repository (or repo) is a folder containing all files to be version-controlled in their present and past forms.
As an example, this thesis is currently version-controlled in a repo called “senior-thesis,” which simply lives as a folder on my computer. This is the local repository. Whenever I make a change to a file, I stage those changes and commit them to my local repository.
A local repository is a folder of version-controlled files that lives on a personal computer.
To stage files is to highlight them as modified; typically comments are made about any changes for these files and a snapshot of them are committed to your local repo.
Once I have committed (a snapshot of) my modified files to my local repo, I push those changes to a remote repository.
A remote repository is the version of your local repository stored on a remote server such as GitHub.
To push files is to send commits from a local repository to its remote repository.
If someone wished to work on my thesis for me, they could clone the remote repository “senior-thesis”, stage, commit, and push changes, and then I could pull them into my local repository, ensuring that our versions our consistent.
To clone a repository is to make a copy of a remote repository on a personal computer; that is, make a local repo from an remote one.
To pull is to retrieve any commits (modifications) that are on a remote repository and update your local repo.
With an understanding of some of the core features of git,11 it’s time to collaborate with others. This is often done using GitHub, which I will redescribe as an online host for git repositories. When I push an updated version of my thesis, others can view that (and previous versions) on GitHub. Because GitHub is a remote platform, it provides a version-controlled backup of my code (or thesis) in case something happened to my computer. On top of that, GitHub has wonderful features for collaboration and productivity. A few features relevant to my thesis include:12
- Issues: A framework for collecting user feedback, reporting software bugs, tracking and prioritizing work, assigning responsibilities, and more.
- Pull requests: A method of pushing changes to a repository which requires approval by an owner. This facilitates discussions of potential changes before they are integrated into the repository.
- Repository visibility: A way to restrict who has access to a repository; public repos are available to anyone on the internet; private repos are only available to the owner and any individuals to whom they explicitly grant access; organization repos are available to people in specific organizations (groups).
For my research, I found myself frequently utilizing these features of GitHub. I constantly left my integrated development environment (IDE), RStudio, opened a web browser, navigated to my repo-of-choice, and created or commented on issues, invited people to the repository, etc. For this reason, I developed a package, {ghee}, that allows common GitHub tasks to be achieved directly from R.
5.3 What is {ghee}?
{ghee} is a user-friendly wrapper for the {gh} package that provides client access to Github’s REST API for common tasks such as creating issues and inviting collaborators.
5.3.1 Technical Aspects
An Application Programming Interface (API) may be thought of as a channel provided by developers of proprietary software that allows individuals to access certain features. The R package {googledrive}, for example, allows you to read, write, and manipulate files in your Google Drive account directly from R. In the same vein, GitHub’s API allows you to perform hundreds of actions involving repositories, organizations, billing, and more. The official GitHub API “Octokit” is provided for Ruby, .NET, and JavaScript, but there are implementations in many other languages from Clojure to Go, Haskell to Perl, Python and R.
For R, the package {gh} provides extremely flexible API access to GitHub, though it requires the use of HTTP verbs such as HEAD
, GET
, and POST
to perform actions. For R users who are familiar with web protocols, or those who don’t mind exploring the documentation for API requests, {gh} is a wonderful package.
{ghee} is not meant to replace {gh} or be an exhaustive API. It is designed for R users, such as myself, who are less comfortable with web protocols but regularly interact with GitHub. That is, it was designed with the goal of helping developers easily achieve common tasks with GitHub. As such, all functions begin with the prefix gh_
, followed by categories of actions such as collab
and issue
. An overview of the package is described in the next section.13
5.4 {ghee} Highlights
5.4.1 Installation
The source code for {ghee} is on my GitHub repo, and the development version of it can be installed and loaded as follows:
# Install released version from CRAN
install.packages("ghee")
# Or, install the development version from GitHub
# install.packages("remotes")
remotes::install_github("jdtrat/ghee")
# Load package
library(ghee)
5.4.2 Use Cases
5.4.2.1 Collaboration
II have found it particularly helpful to use {ghee} when working with collaborators. Normally, to invite someone to a repo, I would have to navigate to its page on Github, go to settings, manage access, and send an invitation manually. With {ghee}, though, it’s as simple as specifying the repo and the invitee:
gh_collab_invite(path = "owner/repository",
invitee = "bestfriend")
If you wanted to invite multiple friends at once, you could do so in a functional programming style. Here’s an example with the {purrr} package, which simply calls the gh_collab_invite()
for each entry in the friends vector.
friends <- c("friend", "pal", "amigo")
purrr::walk(.x = friends,
~ gh_collab_invite(path = "owner/repository",
invitee = .x))
5.4.2.2 Repositories
{ghee} provides functions to create, edit, and delete repositories. As an example, I’ll create a repo called “ghee_test,” as seen below.
# Create a Private Repo
gh_repos_create(path = "jdtrat/ghee_test",
private = TRUE,
description = "A test repository.")

Note how the above picture shows an “Issues” tab. I don’t really want feedback on this repository, so I’m going to disable it with the gh_repos_mutate()
function.
# Disable Issues
gh_repos_mutate(path = "jdtrat/ghee_test",
has_issues = FALSE)

If you decide you don’t want a private repo anymore, no problem! That’s an easy change! In the picture below, you can see the private badge next to the title is gone.
# Change Privacy Settings
gh_repos_mutate(path = "jdtrat/ghee_test",
private = FALSE)

Now, I know what you’re thinking. You don’t like the repo name “ghee_test.” I don’t blame you. Let’s change it! Voila! It is now “ghee_testing.”
# Change Repo Name
gh_repos_mutate(path = "jdtrat/ghee_test",
name = "ghee_testing")

For more repository manipulation options, check out GitHub’s API here. You can also look at the documentation for gh_repos_mutate()
, which expands upon the above examples.
{ghee} also has a function to delete repositories, though it should be used with caution, as it is permanent. Further, if you want to use it, you will need to create a special GitHub PAT with the appropriate permissions. This can be done with the with the {usethis} package as follows: usethis::create_github_token(scopes = "delete_repo")
.
5.4.2.3 Issues
Another big component of GitHub are Issues. {ghee} includes some helper functions for interacting with them. These include, gh_issue_list()
, gh_issue_new()
, gh_issue_comment()
, and gh_issue_assign()
. The first function lists the GitHub issues for a specific repo. The next one allows you to create a new issue, and the other two allow you to comment on or assign existing ones. For example, if I wanted to create an issue for my {shinysurveys} package discussed elsewhere in this thesis, I could do so as follows:
gh_issue_new(path = "jdtrat/shinysurveys",
title = "My Issue Title",
body = "Just wanted to say I love your package!")
To assign that issue to myself, or respond to it, I would use the issue number (which I could get with gh_issue_list()
) and do something like this:
gh_issue_assign(path = "jdtrat/shinysurveys",
issue_number = 5,
assignees = "jdtrat")
gh_issue_comment(path = "jdtrat/shinysurveys",
issue_number = 5,
body = "Thanks, @jdtrat!")
5.5 Conclusion
As mentioned in the vignette, {ghee} was not developed to be an exhaustive API. It was designed to provide a curated set of functions to improve R user’s workflow for common tasks with GitHub. My productivity has benefited from {ghee}, and (excitingly) it has been downloaded over 750 times as of May 11, 2021.
Even so, discussions within the R community14 have indicated a need for automating certain tasks not currently supported, such as relabeling GitHub Issues. While there are other alternatives for this, chiefly {usethis}, I believe the R community could benefit from a common interface. Future work on {ghee} will focus on implementing additional features to improve user’s interaction with GitHub.