Joy Payton
Joy Payton
5 min read


Data Sharing and Privacy: A Very Cursory Overview

There are lots of factors that could affect your ability to use a given dataset that is shared with you:

  • Intellectual property concerns – the data was gathered at a pharmaceutical firm and is owned by them
  • Accuracy issues – the data doesn’t seem right or consistent
  • Professional courtesy – you want to make sure your collaborator gets her publication completed before you use the same data for a different purpose
  • Ethical concerns – the data was gathered in an experiment that was carried out in a way you think goes against responsible research conduct
  • Privacy concerns – you cannot accomplish your work using de-identified or coded data, and you do not (presently) have permission to view it in an identified format because the data comes from a research protocol you’re not included in

In this article, we’re going to concentrate on that last category of factors (privacy issues). Our focus here is on using data that was not collected by you and for which you were not an investigator. This is because the determination of whether or not something is considered Human Subjects Research can depend on if the investigator collected the data themselves.

Privacy laws across the globe flow down from a core set of principles, codified by the Organization for Economic Co-Operation and Development in 1980. These privacy principles are referred to frequently as the Fair Information Practices, or FIPs.

Patient and human research subject privacy are topics that invite both ethical and regulatory consideration.

Privacy principles are non-regulatory ethical guidelines that offer methods of using data while preserving the privacy, autonomy, and dignity of our patients and research subjects. Just because something is not illegal doesn’t mean it’s right, and privacy principles give us ideals to strive for in the way we treat our patients and subjects.

Privacy regulations assist us by giving us some strict boundaries to support privacy principles. At CHOP, two important regulations are HIPAA (which governs health information collected by certain entities) and the Federal Common Rule (which governs human subjects research). There are other regulations that may come into play, such as state laws (which may affect residents of certain states, or information collected in certain states) or the European Union’s General Data Protection Regulation (GDPR) which may affect data collected on European Union patients or subjects. HIPAA and the Common Rule both take privacy principles and set requirements in each category, like requiring specific security measures or requiring privacy notice and authorizations.

Let’s examine the “Use” principle. This principle suggests that access to data (especially when that data is sensitive, like health information) be limited to the smallest subset of users possible. To protect data subject privacy while enabling beneficial use of information,laws like HIPAA recognize what’s called a Spectrum of Identifiability. When data is not identifiable, many people can use it. When data is identifiable, a very small number of people can use it.

The simplest version of the Spectrum of Identifiability looks like this:

De-identified Coded Identified
The data has been stripped of its identifying information and cannot reasonably be re-identified. The data has been stripped of its identifying information but has been assigned a code (e.g. by an Honest Broker) for possible re-identification. The data retains identifiers like names, addresses, and zip codes.
This data can be used by any investigator. It is not governed by HIPAA and may not necessarily need IRB oversight. This data can be used in partnership with the Honest Broker. This data can be used by those named on the protocol, or by those doing work preparatory to research.

Let’s get to know the Spectrum of Identifiability by looking at the question “How can I look at this data to see if it’s useful to my research?”

Can I look at this data?

Evaluating a dataset to see if it could be useful in research is considered “preparatory to research”. For example, let’s say you find an interesting dataset in the Arcus data catalog, and you want to take a peek at the identified data to see if the measurements collected would allow you to start new research using that data. Can you do that?

Answer: From a privacy standpoint, yes, if you are just “peeking” at the data to see if it is useful. You will have to attest that your use is in accordance with this purpose.

By the way, Arcus will also make it simpler for you to examine coded or de-identified data for evaluation before commencing your research.

Can I Use This Data?

What about actually using data in research– analyzing it, testing hypotheses, creating new variables by combining data, publishing findings, and so on? When can you use a dataset that you discover (whether that’s in the Arcus catalog, by chatting with a colleague at a conference, or another means of discovery)?

This is a bit more complicated, and it depends on a few factors, including:

  • Will you use only this data, or will you be collecting new data alongside this dataset?
  • Can you accomplish your research using only de-identified or coded data? If so - you’re ready to go!
  • If not, and you need identifying data, does your research question fall under the original aims of the study that collected the data?

How Can I Examine or Use This Data I Discovered?

You have several paths that allow you work with a dataset that you weren’t part of creating.

You can evaluate it as part of work preparatory to research (e.g. feasibility analysis, grant-writing, checking to see if the data meets your needs) and/or use it to conduct research, depending on the outcome of a few simple questions.

Here we assume you are interested in data that includes living people and that the data under discussion are not data generated in a study in which you were an investigator.

Does the data contain any PHI?

By PHI, we mean anything that can identify the patient or subject -- whether directly (name) or indirectly (date of admission, MRN).

If you're not sure, consult this list of common PHI fields and/or reach out to Dianna Reuter.

The data has no PHI.


I will request a version of the data that has no PHI.

Yes, you may look at and use this data. Use of de-identified datasets or biospecimens is not human subjects research. See the CHOP IRB FAQ
for more details.

Note: if this is a coded dataset (someone can connect this data to PHI via a secret code), you must not request PHI if you wish to stay in this category.

Yes, this data includes PHI and I need to use PHI to do my research.

You can choose to just look at the data first, to see if it will be suitable (work preparatory to research) and if so, then move to the requirements for using the data. Or, if you already know it's suitable, go directly to using it in research.

Look (Preparatory to Research)

Use in Research

You can do work preparatory to research on data that includes PHI by completing three attestations.

You may attest to the following by using the eIRB application to submit a new eIRB Study, choosing the "HIPAA Attestation (Use of PHI Preparatory to Research)" submission:
  1. I am reviewing this data for preparation to research purposes only.
  2. I will not transmit identifying information outside of CHOP or download / save it for later sharing or use outside of CHOP.
  3. The identifying information is data I need in order to conduct preparation for research.

Data containing PHI may be a Limited Data Set, which contains some PHI, limited to dates, ZIP codes, and/or census tracts with no other PHI, or identified data, which includes other PHI. The rules for how to use this data differ depending on what kind of data this is. Which kind of PHI-containing dataset is this?

Limited Data Set

Identified Data

Research conducted on a limited data set that is created by an independent individual (like an honest broker) and shared with you is not human subjects research, as long as the CHOP subjects are not readily identifiable. You must complete a Data Use Agreement, and may then proceed. Data Use Agreements do not need to be reviewed by the IRB.

Research on identified data that is shared with you requires IRB review and an IRB protocol. Visit the eIRB application and choose "Research Study Involving Human Subjects (Exempt, Expedited, Full Board Review) oversight by CHOP IRB".

It’s important to realize that we’re giving a very broad overview to privacy protections, and we want to ensure that we enable nimble, innovative research that’s also ethical and respectful of our patients and research subjects. For more information about data privacy and Arcus, feel free to contact Dianna Reuter.