GFF3 Developement

From SO Wiki
Jump to: navigation, search

Suggestions for changes to the GFF3 should be considered in light of the following:

  1. GFF3 is actively used to annotate hundreds of genomes, by dozens of projects. Backwards compatibility with the community supported tools and datasets generated and used by these groups is a critical consideration.
  2. A big part of the success of GFF3 is it's simplicity. GFF3 is simple enough to manipulate manually with command line tools, but is structured enough to build robust, validating parsers for. Maintaining this balance between simplicity and structure is critical for future success.

Unresolved GFF3 Issues

  1. Discontinuous features
    1. Can discontinuous features occur across landmark features (i.e a gene split across contigs)
    2. Can discontinuous features cross origin of circular genomes?
  2. Which validator do we use
  3. Clarification of the GAP attribute
  4. Landmark features
    1. How are landmark features identified?
    2. How does a landmark feature relate to the ##sequence-region directive?
    3. Is a landmark feature required for each implied landmark as specified in column 1?
    4. Is the SO type of a landmark feature constrained?
  5. Negative start and end coordinates
  6. Is_circular attribute
    1. What features are allowed/disallowed to have this attribute
    2. Should features be split when they cross the origin or have their END mapped forward to be longer than the sequence?
    3. When you have a child that is completely past the origin (an exon), but the parent spans the origin (a transcript) how are the exon START and END calculated? Do you use the same coordinates that they parent is using, or use the actual coordinates?
  7. Can the ##feature-ontology directive be used to extend terms allowed in column 3
  8. Does GFF3/SO support annotating RNA and protein sequences directly?
  9. Additional terms in SO are needed
    1. All EMBL/Genbank/DDBJ Feature table terms should be supported/mapped
    2. Support/map terms used by EMBOSS
  10. The FT_SO.txt mapping file needs to be updated
  11. Should we version the GFF3 file like this ##gff-version 3.1.21
  12. Best Practices Pages
    1. Trans-spliced genes
    2. Partial features
    3. Discontinuous features
      1. Discontinuous features across contigs
      2. Discontinuous features across origin of circular genomes
    4. Link to Best Practices from the spec
  13. Set up wiki accounts for GFF3 discussion participants to edit
  14. What is the best way to do GenBank to GFF3 conversion
  15. Dbxref file updates
  16. ID characters - What characters are allowed/disallowed for ID?
  17. New attributes
    1. Relationship
    2. Has_attribute?
    3. Part/Order?
    4. Start_range End_range
  18. Set up a GFF3 FAQ
  19. Provide links to GFF, GFF2, GTF, UCSC GTF GVF etc spec pages

Resolved GFF3 Issues

  1. gff-version directive Is the ##gff-version directive only allowed once per file?
  2. attribute values quoting Should double quoting of attribute values be allowed disallowed?
  3. GFF3 Fasta Sections
    1. Are FASTA sequences required to be all together at the end of the file?
    2. Are FASTA sequences allowed to interspersed with feature lines?
  4. GFF3 character encoding Does GFF3 specify a character encoding and if so, which one?
  5. Hex code escapes see GFF3 character encoding
    1. Refer to RC??? instead of URL encoded
    2. Clean up wording on explicitly required escapes
    3. Clean up wording on explicitly disallowed escapes
    4. Be sure MAKER is inline with the allowed/disallowed escapes
    5. Required escapes - All columns (tab, newline, carriage return, percent sign, control characters)
    6. Required escapes - column 9 (semicolon, equals, ampersand (why is this escaped?)comma).
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox