Version control your writing

If you’ve spent much time in reproducible science circles, you’ve probably heard folks talking about how valuable it is to version control your code.

If you’re new to the idea of version control, start with our introduction to version control using git and then come back here to read about applying it to your writing. There’s also a great guide to getting started with git on GitHub, including tons of interactive lessons to let you practice what you learn.

If you already have some git experience but need a refresher, check out the git cheatsheet (available in many languages).

In my experience, though, the more surprising super power of version control for researchers is its application to writing! You can apply version control to any plain text file — .txt, .md, .r, .rmd, .csv, and many more.

Version control saves the day!

Allow me to walk you through a few common writing headaches, and show how version control comes to the rescue.

I need to re-work this presentation for a new audience

If you write your presentations as markdown slide decks (which are plain text, and therefore version control-able), then you can create a branch for each new version of the talk you want to create!

For example, maybe you have a great set of slides that you wrote for clinicians and medical researchers explaining how to get started with R. You’re invited to lead an intro to R workshop at a conference, but you know the audience will be much more general. Most of the presentation will work as-is, but you know there are a few slides you’ll want to edit to make the content more relevant. Start a new branch in your repo, and edit the presentation file to your heart’s content — your original version will still be waiting for you, untouched, on the main branch when you come back to it. You can have as many different versions as you like, and still only have one file showing up in the folder, so it never gets cluttered.

If someone else wants to adapt your material for their own presentation, they can fork your repo! Organizing your material in a repository also may help you remember to include things like a license, citation, and README for your project, all of which make sharing (and getting credit for!) your work much easier.

I’ve spent the last month revising a draft according to my advisor’s feedback, and now she’s saying we should undo all of that and go back to the original framing!

Unfortunately, version control won’t help you navigate the social minefield of incorporating (or dismissing) feedback on your writing. But at least it can take some of the pain out of the logistics!

With version control, you’ll take snapshots (called “commits”) of the work as you make each edit. These can capture changes as fine or coarse as you like, but generally with writing I recommend committing whenever you’re done (for now) editing a particular section and before you move to the next — you’ll end up with commit messages that hopefully make sense to you later, like “add Oliviera 2016 cites to intro” or “take out second set of descriptives in methods”.

In this scenario, all you do is click to review the file’s history and scroll back until you see the beginning of the edits your advisor now wants to roll back. You can now work on your file from this earlier point (but you won’t lose the edits, either — if she changes her mind again, you can still recover all that work).

I spilled orange juice on my laptop and lost all my files

Version control can’t help you here unless you’re also backing up your writing files to somewhere other than your computer, but one of the great things about git is that it makes it very easy to back everything up to GitHub. When you set your project up as a git repo, it’s just one tiny step to connect it to an online GitHub repo and then you have a seamless back-up system in place that preserves not just the current version of each file (as you would get from something like Dropbox), but it’s entire edit history.

If you need to switch machines, or even if you just have multiple computers you like to work on, all you need is a quick pull from the GitHub repo and all your local files will be up to date.

Okay, I’m convinced! What should I keep in mind when I version control my writing?

Stick to text

The biggest change for most people when they decide to use git for their writing is that they need to write in plain text files only. Version control doesn’t work well for MS Office files (Word, PowerPoint, etc.), pdf, images, videos, etc.

However, I think you’ll be pleasantly surprised to learn just how much you can create with plain text files! Markdown (and, by extension, R-Markdown) is a plain text format you can use to generate an enormous variety of documents, including properly formatted journal articles, slide decks (including PowerPoint), Word documents, pdfs, websites, entire books, and almost anything else you can think of.

For more details on what you can do with Markdown, check out our post on getting more from R-Markdown.

When you’re using git for your writing, the main document(s) you edit will be plain text, like Markdown, and you’ll periodically render them to final output documents in whatever format(s) you need when you’re ready to share with a colleague or if you just want to review the final layout yourself. You don’t bother version controlling that output document, though, and you don’t make edits there — everything of importance happens in your plain text files where you can record the commit history.

Break up big documents

Because you’re already adding a step at the end to turn your plain text into your final output format, many people find they like making that last step not only apply the formatting, but combine several smaller files together. This is a trivial bit to add to the rendering step, but it can help make your commit history a lot easier to navigate.

Let’s say you’re writing a journal article. You could write it all as one Markdown document, or you could save several different files, each for a different section of the paper (e.g. intro.md, methods.md, results.md, conclusions.md). Because they’re separate files, they each get their own version history, which makes it easier for you to browse back through the changes.

If you want a deeper dive into how to think about R-Markdown documents in a project, check out Emily Riederer’s blog post on R-Markdown driven development. Note that the focus of that post is on code-heavy R-Markdown, so if you’re thinking about writing mostly text with little or no code, you may find Emily’s post less relevant.

Keep in mind, though, that your primary audience for your commit history is likely Future You — if you prefer to look at a single history of commits to one big file rather than separate histories for each section, then you should probably not break it up! Do what is most useful for you.

Build good commit habits

Your version control history will only be as valuable to you as your commits, so it’s worth putting a little thought into how to gift Future You with a lovely, informative, easy-to-navigate commit history.

If you haven’t already, read How to Write a Good Commit Message, a short blog post that has become a standard reference in the field.

In addition to the excellent advice in that post, you can do yourself a favor by finishing each writing task and committing before moving on to the next (that’s probably a good habit to build anyway). So, for example, if you get a set of a dozen suggested revisions back from a reviewer, maybe consider each revision one commit — make just the changes for that revision, commit it with an informative message, then move on to the next. This helps you avoid commit messages like “Add cites to intro, move descriptives to front of methods, rework opening para of conclusion” or worse “Edits and tweaks throughout.”

Take advantage of shortcuts

The biggest shortcut for most writers offered by switching to writing in Markdown is that there are tools to handle citations for you. Finally, you can forever say goodbye to writing out your own works cited lists! You just put in pandoc citation syntax for your citations throughout your writing, and when you render the final output document, you can set it up to automatically format all of those inline citations correctly according to whatever style you need, and automatically generate a complete works cited list for you at the end.

Learn more about how to use citations in Markdown and much more in our post on getting more from R-Markdown.

Another powerful advantage of writing in plain text files is that it lets you combine writing and analysis code together in a single file. That means you can have plots, tables, and statistical summaries generated automatically from your analysis right at the point where you want them in your writing — no more copy-pasting output! This is called literate statistical programming, and it’s a game changer. To get started, check out our post on why to use literate statistical programming.