Excel, if you must…

Data professionals can be a contentious lot, and this is never more true than when an Excel spreadsheet is presented as a research method. Full disclosure – I think Excel is more than adequate for high school and even undergrad research, but have serious doubts about its use in science beyond that point.

Here are some reasons why:

Excel can change your data and rename genes (see, e.g. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-80, https://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7)
Cut and paste errors are easy to do and can have bad consequences (see, e.g. https://www.theregister.co.uk/2003/06/19/excel_snafu_costs_firm_24m/, https://nsaunders.wordpress.com/2014/09/09/new-ways-to-butcher-biological-data-using-excel/)
Making the wrong cell selection can reverse your effect (see, e.g. http://www.slate.com/blogs/moneybox/2013/04/16/reinhart_rogoff_coding_error_austerity_policies_founded_on_bad_coding.html)
Excel “features” like cell-hiding can lead to costly outcomes (see, e.g. http://www.businessinsider.com/2008/10/barclays-excel-error-results-in-lehman-chaos)
Dates in Excel are an unmitigated nightmare
Excel has a relatively low (for some researchers) ceiling on number of rows available

However, despite my misgivings, I do appreciate a recent publication by Karl Broman that details how to make your use of Excel as scientifically rigorous as possible, and avoid embarassing mistakes like Excel changing your data by renaming genes. If you do choose to use Excel, I encourage you to read Broman’s piece. He details methods by which errors can be reduced in data storage in Excel while not shying away from some of the pitfalls of the popular program. In his abstract, Broman outlines these basic principles:

be consistent
write dates like YYYY-MM-DD
don’t leave any cells empty
put just one thing in a cell
organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row)
create a data dictionary
don’t include calculations in the raw data files
don’t use font color or highlighting as data
choose good names for things
make backups
use data validation to avoid data entry errors
save the data in plain text file

Give Broman’s article a read, and consider whether Excel can be part of your research toolkit, and how to use it effectively.

Like this article? Click "Like" to let us know.

Tags

Excel, if you must…