GFF3 best practices
Contents |
Summary
A community page for describing how to annotate tricky situations using GFF3.
- Some of these examples are drawn from the pathological example given in the file specification.
- The best place to resolve a best practice is to email to song-devel@lists.sourceforge.net list before posting the resolution here.
An operon
A classic operon is define as genes in a polycistronic transcript that are co-regulated by cis-regulatory element(s):
regulatory element
* ================================================> operon
----->XXXXXXX*-->BBBBBB*--->ZZZZ*-->AAAAAA*-----
This biology can be indicated in GFF3 in the following way:
ChrX . operon XXXX YYYY . + . ID=operon01;name=my_operon ChrX . promoter XXXX YYYY . + . Parent=operon01 ChrX . gene XXXX YYYY . + . ID=gene01;Parent=operon01;name=resA ChrX . gene XXXX YYYY . + . ID=gene02;Parent=operon01;name=resB ChrX . gene XXXX YYYY . + . ID=gene03;Parent=operon01;name=resX ChrX . gene XXXX YYYY . + . ID=gene04;Parent=operon01;name=resZ ChrX . mRNA XXXX YYYY . + . ID=tran01;Parent=gene01,gene02,gene03,gene04 ChrX . exon XXXX YYYY . + . ID=exon00001;Parent=tran01 ChrX . CDS XXXX YYYY . + . Parent=tran01;Derives_from=gene01 ChrX . CDS XXXX YYYY . + . Parent=tran01;Derives_from=gene02 ChrX . CDS XXXX YYYY . + . Parent=tran01;Derives_from=gene03 ChrX . CDS XXXX YYYY . + . Parent=tran01;Derives_from=gene04
The regulatory element ("promoter" in this example) is part of the operon via the Parent tag. The four genes are part of the operon, and the resulting mRNA is multiply-parented by the four genes, as in the earlier example.
A mate pair or a paired end
TBD.
A SNP with an associated flanking sequence
Please see the Genome Variation Format for the most robust way to represent sequence alterations in a robust GFF3 compatible way.
However, some groups will want to represent a sub set of sequence alterations directly in GFF3. The following is a description of one approach to doing this for SNPs.
Although a SNP (0000694) represents a single base, they are often probed using their flanking sequences. For this reason, it often makes sense to record no only the SNP position itself, but also the flanking sequence that was used in the experement to probe that position (for example an OPA).
chr01 ssaha run id 99 match 62066 62167 94 + . ID=snp_seq_10493;Target=snp_seq_10493 1 101;ident=99.01;length=101 chr01 x SNP 62117 62117 . . . Parent=snp_seq_10493;vcf=...