Using Phenotype Ontologies in Human GVF

From SO Wiki
Jump to: navigation, search

This page describes a set of best practices for using phenotype ontologies in Human GVF. The goal is to support linkage of genomic features to computationally operational phenotype annotations.


Using IDs vs Term Names vs Comments

GVF allows you to use IDs or label strings to indicate the phenotype. The ID should be used at all times. Formatting for each ontology source ID is indicated below. If a specific enough ontology term cannot be found, use the comment field to enter a free text description and choose a higher level ontology term ID for the ID field. Then go to the respective tracker and request that the term be added (see below for specifics).

Which Phenotype Ontology to use?

If we take "phenotype" to be inclusive of traits, diseases, pathological features, etc then there are a number of choices. The main ontologies are listed below, together with examples.

Human Phenotype Ontology (HP)

In OBO Library? YES

Scope: Any human phenotype. Currently focuses on morphological abnormalities, but is being extended to cover other domains and is under active development.



An individual with "Cafe-au-lait" spots would use this ontology class ( and the term in the GVF file would be indicated as HP:0000957

##phenotype-description Term=HP:0000957;Ontology=

Disease Ontology (DO)

In OBO Library? YES

Scope: All human diseases.


An individual with Parkinson’s disease would be recorded in the DO as

The ID should be recorded in the GVF as DOID:14330

##phenotype-description Term=DO:14330;Ontology=


OMIM may be more suited to recording if the individual has a particular genetic disorder. A description of OMIM ID schema can be found here:

An individual with GYRATE ATROPHY OF CHOROID AND RETINA ( would be recorded as: OMIM:258870


In OBO Library? NO

SNOMED-CT has "findings" and "disorder" sub-hierarchies, as well as disease, which can be used to indicate the phenotype. Many electronic health care systems use SNOMED-CT, so there may be phenotype records available for individuals already using this vocabulary. Note that SNOMED-CT is not open - there are ongoing discussions about SNOMED transitioning to an open system. For now, bear in mind that if you use SNOMED it may restrict the ability of some people to do analyses that make use of the ontology structure (though mappings to HPO are available from the HPO team) If you use SNOMED, use the SCTID ID space.

Example: An individual with male hypogonadism would be recorded as:


Information about SNOMED-CT is available here:


In OBO Library? NO

Like SNOMED, this is frequently used in EHRs to record billing codes and diagnoses.

For describing human phenotypes, HPO may be more suitable, but if you have access only to ICD-9 encoded phenotype data, please use the following ID format.


For a patient with ‘cleft lip, record the ID as follows:


Note that ICD9 is being replaced by newer versions ICD10 and ICD11 as described here:

Mammalian Pathology Ontology (MPATH)

In OBO Library? YES


Pathological physical entities and processes. MPATH is focused on actual pathological entities, and may be most suitable for genotyping of pathological tissue samples.


An individual with truncoconal septal defect would be recorded as MPATH:619 in the GVF file.

##phenotype-description Term=MPATH:619;Ontology=

Mapping between phenotype terminology terms

Medgen provides a slice of the UMLS for the purposes of annotating data in the context of ClinVar. Medgen therefore contains mappings to other resources, such as MeSH and HPO. Since MedGen is already a mapping, we recommend using one of the source ontologies for the annotations in the GVF.

Medical Subject Headings

MeSH (Medical Subject Headings) is the NLM controlled vocabulary thesaurus used for indexing articles for PubMed. Homepage for MeSH: and a browser here: In particular, consider annotation using the disease ( C ) tree. An example of usage would be for Aphakia, which would be recorded as MESH:C11.510.103

Personal tools