Data professionals can be a contentious lot, and this is never more true than when an Excel spreadsheet is presented as a research method. Full disclosure – I think Excel is more than adequate for high school and even undergrad research, but have serious doubts about its use in science beyond that point.

Here are some reasons why:

However, despite my misgivings, I do appreciate a recent publication by Karl Broman that details how to make your use of Excel as scientifically rigorous as possible, and avoid embarassing mistakes like Excel changing your data by renaming genes. If you do choose to use Excel, I encourage you to read Broman’s piece. He details methods by which errors can be reduced in data storage in Excel while not shying away from some of the pitfalls of the popular program. In his abstract, Broman outlines these basic principles:

  • be consistent
  • write dates like YYYY-MM-DD
  • don’t leave any cells empty
  • put just one thing in a cell
  • organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row)
  • create a data dictionary
  • don’t include calculations in the raw data files
  • don’t use font color or highlighting as data
  • choose good names for things
  • make backups
  • use data validation to avoid data entry errors
  • save the data in plain text file

Give Broman’s article a read, and consider whether Excel can be part of your research toolkit, and how to use it effectively.