News

Welcome to the Sequence Ontology

This is the home page of the Sequence Ontology Project (SO), a joint effort by genome annotation centres, and other groups using sequence annotation data, including: WormBase, FlyBase, the Mouse Genome Informatics group, and the Sanger Institute. We are a part of the Gene Ontology Project and the Open Biomedical Ontologies (OBO) . Our aim is to develop an ontology suitable for describing biological sequences. For questions, please send mail to the SO developers mailing list

Introduction

The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence. SO includes different kinds of features which can be located on the sequence. Biological features are those which are defined by their disposition to be involved in a biological process. Examples are binding_site and exon. Biomaterial features are those which are intended for use in an experiment such as aptamer and PCR_product. There are also experimental features which are the result of an experiment. SO also provides a rich set of attributes to describe these features such as "polycistronic" and "maternally imprinted".

The Sequence Ontologies are provided as a resource to the biological community. They have the following obvious uses:

  • To provide for a structured controlled vocabulary for the description of primary annotations of nucleic acid sequence, e.g. the annotations shared by a DAS server (BioDAS, Biosapiens DAS), or annotations encoded by GFF3.
  • To provide for a structured representation of these annotations within databases. Were genes within model organism databases to be annotated with these terms then it would be possible to query all these databases for, for example, all genes whose transcripts are edited, or trans-spliced, or are bound by a particular protein. One such genomic database is Chado.
  • To provide a structured controlled vocabulary for the description of mutations at both the sequence and more gross level in the context of genomic databases.

The Sequence Ontology is part of OBO. It has close links to other ontology projects such as the RNAO consortium, and the Biosapiens polypeptide features.

Current SO Ontology files

README

Ontology Latest CVS Revision Current and Archived Releases Description
SO so.obo so.obo SO Summary
SO with Composite Terms so-xp.obo so-xp.obo Composite Terms Summary
SOFA sofa.obo sofa.obo SO Summary

File formats

The Sequence Ontologies use the OBO flat file format specification version 1.2, developed by the Gene Ontology Consortium.

The ontology is also available in OWL from Open Biomedical Ontologies. This is updated nightly and may be slightly out of sync with the current obo file. OWL is generated from the obo file using go-perl. The resolvable URI for the current version of SO is http://purl.org/obo/owl/SO. As of Jan 25 2007, the transform has been updated from the old lossy transform to the new non-lossy mapping.

Sequence Ontology Publications

A standard variation file format for human genome sequences Reese MG, Moore B, Batchelor C, Salas F, Cunningham F, Marth GT, Stein L, Flicek P, Yandell M, Eilbeck K Genome Biology 2010, 11:R88
SOBA: sequence ontology bioinformatics analysis Moore B, Fan G, Eilbeck K Nucl Acids Res 2010, 38(suppl 2)
Evolution of the Sequence Ontology terms and relationships Mungall, C. J. Batchelor C. Eilbeck K. J Biomed Inform. 2010 Mar 10
Quantitative Measures for the Management and Comparison of Annotated Genomes. Eilbeck K., Moore B., Holt C., Yandell M. BMC Bioinformatics 2009, 10:67
The Protein Feature Ontology: a tool for the unification of protein feature annotations. Reeves G.A., Eilbeck K., Magrane M., O'Donovan C., Montecchi-Palazzi L., Harris M.A., Orchard S., Jimenez R.C., Prlic A., Hubbard T.J., Hermjakob H., Thornton J.M. Bioinformatics. 2008 Dec 1; 24(23):2767-72
A Chado case study: an ontology-based modular schema for representing genome-associated biological information Christopher J. Mungall 1, David B. Emmert, and The FlyBase Consortium. Bioinformatics 2007 23(13):i337-i346
The Sequence Ontology: A tool for the unification of genome annotations. Eilbeck K., Lewis S., Mungall C.J., Yandell M., Stein L., Durbin R., Ashburner M. Genome Biology (2005) 6:R44
Sequence Ontology Annotation Guide. Eilbeck K. and Lewis S. Comparative and Functional Genomics (2004) 5:642-647

Interested in participating? Join the mailing list, checkout the CVS archive and contact the developers.

To become a SO developer, set up an account on SourceForge and drop a note to Karen Eilbeck.