Introduction to the Sequence Ontology

The two major aspects of the complete Sequence Ontology are:

  • Sequence Features: for objects that can be located on sequence in coordinates
  • Sequence Attributes: for describing the properties of features

A "lite" version of SO is also available called SOFA (Sequence Ontology Feature Annotation) which includes only locatable sequence features and is designed for use in such outputs as GFF3. It is for this reason that all SO terms are "unix friendly" (i.e. contain no white space, never begin with an integer and do not include characters such as "'" or hypens). SOFA is a subset of SO and all terms in SOFA that are also in SO are marked in SO with the category tag "SOFA". SOFA will be more stable than the full version of SO.

Currently three versions of the Sequence Ontology exist:

  • The full version - called SO, contains several hundred terms and is intended to be used in concert with a heavily curated genome annotation project.
  • The SOFA version - which stands for 'Sequence Ontology Feature Annotation', has terms that can be directly located to biological sequence. These terms are most likely to be used in a concert with a partially or fully automated annotation effort.
  • The cross-product version - which contains all of the cross-product terms.

The latest versions of the ontologies are located in the ontology page of github.

There are undoubteldy errors and ommissions. Feedback is not only welcome but (almost) demanded.


The ontology is best viewed using an ontology editor, such as OBO-Edit or the miSO browser.