Let’s say you need to define a clinical cohort, a list of CHOP patients who meet certain criteria, in order to

  • Determine whether a study is feasible
  • Write a grant demonstrating that you have a good source of potential subjects
  • Recruit subjects that meet certain criteria (this is generally handled through the Recruitment Enhancement Core)
  • Conduct research that is governed by a research protocol

Some cohorts are fairly simple to define and gather data on. However, sometimes cohort definition can be complex and characterized by lots of back-and-forth conversations. In the case of a complex cohort with multiple factors (such as number of days of a certain treatment regimen, specific combinations of lab results, readmission requirements within a given time frame, numerous exclusion criteria, etc.), the Clinical Reporting Unit (CRU) will ask you to describe the cohort and will then come back with clarifying questions, describe any limitations of the data, and work to extract the data. There are often several rounds of clarification as you narrow or broaden your cohort, make your requirements more explicit, or ask questions about the data you received from the CRU.

This process is necessary for many requests, as clinical data can be messy. This is because humans are messy, and our data is messy. Also, clinical data is not charted like a strict research protocol with a handful of data recorders that all agree on a certain method for recording data the same way and with the same rigor and definitions. Cohort definition takes time, and some degree of trial and error. If you’re asking for counts of a certain diagnosis, you might find an unusually high count because some practices or some physicians over-diagnose or use a certain code as a placeholder while they figure out what’s really going on. If you’re looking for evidence of some specific words or phrases in a notes field, you might be thrown off by negation: “no evidence of drug abuse” may well pop up in your search, if you ask for uses of the phrase “drug abuse”. Another example of complexity includes being aware of coding differences between ICD9 and ICD10, and finding patients belonging to a clinical cohort over a period in which both codes may be in use. While many codes in ICD-9-CM map directly to codes in ICD-10-CM, in some cases, a clinical analysis may be required to determine which code or codes should be selected for your mapping.

Typically, what happens is that you define a cohort using a rough definition in your own words, then refine it over time along with the CRU as you get closer and closer to the actual data you care about. This iterative process is helpful, but hard to track and remember.

Arcus makes cohort definition a more streamlined process, where you can start off knowing more about commonly used fields, make a more informed and precise request, and track all the changes to your cohort definition over time as you work together with the Arcus team. In addition, you can see a data dictionary that describes commonly used fields so you can get the right cohort faster. Check out this sneak preview of a handful of fields in the data dictionary:

Field Name Description Data Type
language_c Language Integer
intrptr_needed_yn Patient needs an interpreter String
txp_pat_yn Indicates if the patient is a transplant patient. (Y/N) String
ped_multi_birth_yn One of a multiple birth String
icd9_cd ICD9 code String
icd9_name Name of the ICD9 diagnosis String

Once you have a precise, useful cohort, we save that cohort definition so others who have the same or similar requests can use your request as a starting point, saving everyone time later on. Over time, we will create a catalog of cohort definitions to make research easier. In this catalog, you’ll be able to see the clinical cohorts that have already been defined. For example, you might see something like “girls with multiple ER visits related to asthma not well controlled by albuterol” or “13+ year olds with femur fracture requiring surgery”. Arcus makes finding de-identified data easy; and coming soon is a simple method for requesting identifying data where appropriate.

Creating a simpler method to access pertinent clinical data is one more way Arcus saves you time and improves your research experience.