So, I read the title:
Mining locus tags in PubMed Central to improve microbial gene annotation
and skimmed the abstract:
The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases.
and thought, well OK, but wouldn’t it be better to incorporate annotations in the first place – when submitting to the public databases – rather than by this indirect method?
The point, of course, is to incorporate new findings from the literature into existing records, rather than to use the tool as a primary method of annotation. I do believe that public databases could do more to enforce data quality standards at deposition time, but that’s an entirely separate issue.
Big thanks to Michael Hoffman for a spirited Twitter discussion that put me straight.
Everyone likes constructive feedback. I received a couple of great comments on my previous post, which warrant a brief discussion.
@vlandham points out that when the main BioRuby repository updates, you’ll want to update your local repository. Using git, you do that by adding a remote which points to the original repository, from which you can fetch updates and merge with your local version:
git remote add upstream https://github.com/bioruby/bioruby.git
# fetch/merge only when main repo updates
git fetch upstream
git merge upstream master
This is described at the GitHub help page Fork A Repo.
Michael points to an article titled A successful Git branching model. It suggests that when developing new features you create a feature branch (also called topic branch). This can help with the management of new features and creates a more complete commit history if/when the new feature is merged back into your development repository. The article also suggests a main branch for development named develop, rather than the default master.
I haven’t quite got my head around all the ins-and-outs of the article yet, but it’s well worth a read.