Why Use Literate Statistical Programming?

Literate statistical programming is a tool in the reproducible research toolkit that will allow you to disseminate not only your findings, but your methods (and if permissible, even your data) in a way that provides increased possibility of others (and future you) understanding and being able to quickly replicate what you’ve shown. It can help move science forward as well as save you time! Let’s consider a few reasons why you should consider adding literate statistical programming to your toolkit.

Not sure what we mean by “Literate Statistical Programming”? Check out our article that introduces the concept. Want to see it in action? Do a quick (10 minutes or so) lab to create your first literate statistical script.

1. Literate Statistical Programming is a gift to your future self.

Maybe this doesn’t happen to you, but it does to me – I write some code, set it aside for a few weeks, and it’s suddenly obscure. I know there was a good reason I did things this way, and I was certain I’d remember, but I have no idea at this point what I was trying to accomplish with that cryptic line.

Yes, you could just comment your code, and not go the route of full literate statistical programming, but writing full sentences as if you were instructing a stranger about why you’re doing certain things will also be a boon to future-you, who may find present-you’s shorthand comments just as cryptic as the code.

2. Literate Statistical Programming allows you to delegate.

You’re busy – seeing patients, supervising postdocs, writing grant proposals. You’d rather not also have to be the person who processes every bit of data that comes in! If you develop a literate script that spells things out well, you can hand off some or most of this work in the future to more junior staff, who can do things like data preparation, cleaning, and initial analysis in a more cost-effective way. You create the detailed data recipe once, which takes a while, but it saves you time and keeps you out of the kitchen for weeks or months at a time.

3. Literate Statistical Programming can act as your publishing platform.

Tired of copying tables and figures into a Word document? Packages like rticles allow you to do your entire manuscript in R Markdown, following the formatting guidelines of specific journals. Want to onboard or educate your staff? You can do that in literate statistical programming. Want to write an entire book? Ditto. Want to create a slide show that includes code snippets, visualizations, and text? That’s a snap in RStudio. No more having to have one program up to do your word processing, another up to do your statistical analysis, a third for visualizations.

4. Literate Statistical Programming provides a single flow of thought.

Instead of having code be an appendix, which requires some back and forth to realize what each section of code is doing and how it relates to, stems from, or leads to the text of a manuscript, you can have a single flow of thought that takes you from description of what you want to do, the code that does what you wanted, and the output that leads you to make other decisions. It’s an interweaving of scientific logic and code. This can make your data analysis pipeline both easier to construct as well as easier to understand by a colleague.