GFF3 best practices

From SO Wiki
Jump to: navigation, search

Contents

Summary

A community page for describing how to annotate tricky situations using GFF3.

  • Some of these examples are drawn from the pathological example given in the file specification.
  • The best place to resolve a best practice is to email to song-devel@lists.sourceforge.net list before posting the resolution here.


An operon

A classic operon is define as genes in a polycistronic transcript that are co-regulated by cis-regulatory element(s):

  regulatory element
  *  ================================================> operon
     ----->XXXXXXX*-->BBBBBB*--->ZZZZ*-->AAAAAA*-----

This biology can be indicated in GFF3 in the following way:

ChrX  . operon   XXXX YYYY  .  +  . ID=operon01;name=my_operon
ChrX  . promoter XXXX YYYY  .  +  . Parent=operon01
ChrX  . gene     XXXX YYYY  .  +  . ID=gene01;Parent=operon01;name=resA
ChrX  . gene     XXXX YYYY  .  +  . ID=gene02;Parent=operon01;name=resB
ChrX  . gene     XXXX YYYY  .  +  . ID=gene03;Parent=operon01;name=resX
ChrX  . gene     XXXX YYYY  .  +  . ID=gene04;Parent=operon01;name=resZ
ChrX  . mRNA     XXXX YYYY  .  +  . ID=tran01;Parent=gene01,gene02,gene03,gene04
ChrX  . exon     XXXX YYYY  .  +  . ID=exon00001;Parent=tran01
ChrX  . CDS      XXXX YYYY  .  +  . Parent=tran01;Derives_from=gene01
ChrX  . CDS      XXXX YYYY  .  +  . Parent=tran01;Derives_from=gene02
ChrX  . CDS      XXXX YYYY  .  +  . Parent=tran01;Derives_from=gene03
ChrX  . CDS      XXXX YYYY  .  +  . Parent=tran01;Derives_from=gene04

The regulatory element ("promoter" in this example) is part of the operon via the Parent tag. The four genes are part of the operon, and the resulting mRNA is multiply-parented by the four genes, as in the earlier example.

A mate pair or a paired end

TBD.

A SNP with an associated flanking sequence

Please see the Genome Variation Format for the most robust way to represent sequence alterations in a robust GFF3 compatible way.

However, some groups will want to represent a sub set of sequence alterations directly in GFF3. The following is a description of one approach to doing this for SNPs.

Although a SNP (0000694) represents a single base, they are often probed using their flanking sequences. For this reason, it often makes sense to record no only the SNP position itself, but also the flanking sequence that was used in the experement to probe that position (for example an OPA).


chr01	ssaha run id 99	match	62066	62167	94	+	.	ID=snp_seq_10493;Target=snp_seq_10493 1 101;ident=99.01;length=101
chr01	x	SNP	62117	62117	.	.	.	Parent=snp_seq_10493;vcf=...
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox