Filling sequence databases with junk…

…is how I sometimes unkindly describe environmental sequencing projects. So I’m delighted to see a critical analysis of sequence data from the Sargasso sea in the latest BMC Bioinformatics. It’s not a great paper in terms of analytical methods, but points out that “the Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins”.

Perhaps the NCBI should consider keeping such data separate from the bulk of the nr database.