GFF3 and the SO

I like GFF file format. Plain old ASCII text, human and machine-readable, simple yet full-featured and good BioPerl support.

I’m planning to use it in a project and browsing the GFF3 specs, I discovered that column 3 “type” should be a SOFA term – SOFA being part of the Sequence Ontology Project. All well and good, except:

  • The features that I would like to annotate are transmembrane helices (and the loops between them), as predicted using TMHMM. There does not seem to be a “TM helix” feature in the SO, which I think is a rather major omission.
  • It’s not clear to me if the SO project is still actively maintained. A lot of their SourceForge pages seem to be slow, broken or not updated for some time. The Term Tracker mechanism for suggesting new terms seems to be non-functional.

If anyone knows the current status of the SO, I’d like to hear about it. I’d also suggest that terms to describe both transmembrane helices and the extramembrane regions connected to them would be rather useful.

One thought on “GFF3 and the SO

  1. Chris Fields

    I have had similar issues with RNA structural data (sec. structure, modified nucleotides, etc). Even though there is an RNA Ontology Consortium that apparently works with GO/SO I have yet to see any relevant terms added.

Comments are closed.