Excel, if you must…
Data professionals can be a contentious lot, and this is never more true than when an Excel spreadsheet is presented as a research method. Full disclosure – I think Excel is more than adequate for high school and even undergrad research, but have serious doubts about its use in science beyond that point.
Here are some reasons why:
- Excel can change your data and rename genes (see, e.g. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-80, https://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7)
- Cut and paste errors are easy to do and can have bad consequences (see, e.g. https://www.theregister.co.uk/2003/06/19/excel_snafu_costs_firm_24m/, https://nsaunders.wordpress.com/2014/09/09/new-ways-to-butcher-biological-data-using-excel/)
- Making the wrong cell selection can reverse your effect (see, e.g. http://www.slate.com/blogs/moneybox/2013/04/16/reinhart_rogoff_coding_error_austerity_policies_founded_on_bad_coding.html)
- Excel “features” like cell-hiding can lead to costly outcomes (see, e.g. http://www.businessinsider.com/2008/10/barclays-excel-error-results-in-lehman-chaos)
- Dates in Excel are an unmitigated nightmare
- Excel has a relatively low (for some researchers) ceiling on number of rows available
However, despite my misgivings, I do appreciate a recent publication by Karl Broman that details how to make your use of Excel as scientifically rigorous as possible, and avoid embarassing mistakes like Excel changing your data by renaming genes. If you do choose to use Excel, I encourage you to read Broman’s piece. He details methods by which errors can be reduced in data storage in Excel while not shying away from some of the pitfalls of the popular program. In his abstract, Broman outlines these basic principles:
- be consistent
- write dates like YYYY-MM-DD
- don’t leave any cells empty
- put just one thing in a cell
- organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row)
- create a data dictionary
- don’t include calculations in the raw data files
- don’t use font color or highlighting as data
- choose good names for things
- make backups
- use data validation to avoid data entry errors
- save the data in plain text file
Give Broman’s article a read, and consider whether Excel can be part of your research toolkit, and how to use it effectively.