GFF3 character encoding
I clarified the character encoding and escaping issues as follows:
Description of the Format
GFF3 files are 9-column, tab-delimited plain text files. Literal tabs, newline, carriage return, percent sign and control characters must be encoded using RFC 3986 Percent-Encoding as described below; no other characters may be encoded. The file contents may include any character in the set supported by the operating environment, although for portability with other systems, use of Latin-1 or Unicode are recommended.
The format consists of 9 columns, separated by tabs (NOT spaces). The
following characters must be escaped using URL escaping conventions
(%XX hex codes):
- tab (%09)
- newline (%0A)
- carriage return (%0D)
- % percent (%25)
- control characters (%00 through %1F, %7F)
In addition, the following characters have reserved meanings in column-9 and must be escaped when used in other contexts:
- ; semicolon (%3B)
- = equals (%3D)
- & ampersand (%26)
- , comma (%2C)
...
Note that unescaped spaces are allowed within fields, meaning that parsers must split on tabs, not spaces. Use of the "+" (plus) character to encode spaces is depracated from early versions of the spec and is no longer supported.