The UMLS Metathesaurus

UMLS Metathesaurus

The Unified Medical Language System (UMLS) is a set of tools and documentation maintained and distributed by the National Library of Medicine intended to, “enable interoperability between computer systems”. This overview will focus on one main tool - the Metathesaurus - but the UMLS has additional tools including some for natural language processing.

The metathesaurus is a key part of the UMLS. The metathesaurus provides access to over 200 standardized domain-specific vocabularies that are used in biomedical and clinical research. Most of these vocabularies also include modeled relationships between terms and classes, which makes them ontologies1 and especially useful for computational analysis, but for the purposes of this post I’ll refer to all as vocabularies. Examples of these include ICD-10, Loinc, RxNorm, Human Phenotype Ontology (HPO), and many more.

Maintaining these vocabularies in a uniform structure and a centralized location is of great benefit to researchers who need to access more than one vocabulary in the course of their research. It means all vocabularies can be accessed via the same API, and pieces of any vocabulary can be referenced in the same manner e.g. if a researcher has built a script to utilize HPO terms, that script can be modified to instead access SNOMED terms with very minimal editing.

In addition to maintaining these vocabularies and their documentation in one area, the metathesaurus adds significant value by establishing and making available an overarching network of inter-vocabulary concept mappings. The below image illustrates how terms from different vocabularies that have the same conceptual meaning are “linked” via a shared UMLS Concept ID: “C0018681 Headache”.

Headache terms including "headache", "cranial pain", and "cephalgia head pain"2

The UMLS Metathesaurus concept mapping system creates opportunities for interesting new areas of research. As one example, it allows for analyzing datasets and clinical observation documentation that otherwise would not be comparable without significant manual intervention. Imagine that a researcher has a set of clinical observations coded with ICD-10 diagnoses, and would like to compare these to another set of data that utilizes HPO terms in its structure. Rather than creating a time-consuming manual mapping, the researcher can translate the ICD-10 terms into applicable HPO terms (or vice versa) using the UMLS metathesaurus concepts, and can then proceed with their datasets “in the same language”. In addition to being faster, this means that the researcher’s work is more in line with similar researchers who may also be doing ICD-HPO mapping. The research community doesn’t have to worry about differences in mapping strategies across research efforts.

Concept mapping between vocabularies also provides a useful tool for natural language processing. By traversing the mapped network of vocabulary terms (and their synonyms), a person can easily build a “master” library of synonyms to use in text processing. This approach can be seen in the Mayo Clinic cTAKES system, which is in use by Arcus as we discover medical vocabulary in clinical notes and use NLP to create a set of discrete fields associated with free-text fields that may express the medical term in a variety of ways.3

The UMLS Metathesaurus can be accessed via browser application, downloaded in its entirety, or interacted with programmatically via API. Learn more about the UMLS Metathesaurus, how it can be used, and what vocabularies are available via this service at

  1. Robert Stevens et al. “What is an ontology?”. Ontogenesis. (2010){:target=”_blank”} 

  2. “CUI Map.” National Library of Medicine. 

  3. Savova, Guergana K et al. “Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.” Journal of the American Medical Informatics Association : JAMIA vol. 17,5 (2010): 507-13. doi:10.1136/jamia.2009.001560