Why, it seems like only 12 years since we read Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics.
And can it really be 4 years since we reviewed the topic of gene name corruption in Gene name errors and Excel: lessons not learned?
Well, here we are again in 2016 with Gene name errors are widespread in the scientific literature. This study examined 35 175 supplementary Excel data files from 3 597 published articles. Simple yet clever, isn’t it. I bet you wish you’d thought of doing that. I do. The conclusion: about 20% of the articles have associated data files in which gene names have been corrupted by Excel.
What if there is no tomorrow? There wasn’t one today.
We tell you not to use Excel. You counter with a host of reasons why you have to use Excel. None of them are good reasons. I don’t know what else to say. Except to reiterate that probably 80% or more of the data analyst’s time is spent on data cleaning and a good proportion of the dirt arises from avoidable errors.
My machines upgraded from R version 3.1.3 to version 3.2.0 last week, which means that existing code suddenly cannot find packages and so fails. Some notes to myself, possibly useful to others, for what to do when this happens. Relevant to Ubuntu-based systems (I use Linux Mint).
1. Update packages
cp ~/R/x86_64-pc-linux-gnu-library/3.1 ~/R/x86_64-pc-linux-gnu-library/3.2
1.1. rJava issues
My rJava installation failed because code was trying to compile against jni.h which was not present on my system. Solution:
sudo apt-get install openjdk-7-jdk
sudo R CMD javareconf
and then in R:
2. Update Bioconductor
Bioconductor is also upgraded so requires more than a package update. Probably need a new R session for this one.
My Bioconductor Chemminer update failed because package gridExtra was absent:
3. General issues
When R is installed on Linux Mint, some packages are installed by default in
/usr/lib/R/library. When performing updates as a non-root user, you’ll see messages telling you that this location is not writable and asking if you want to use your own library location. If you reply “yes”, you’ll have packages in both system and user locations. It’s probably better to say “no” and let the Ubuntu package management system handle the package upgrades…although when I tried that, the entire upgrade process halted…
And now we are all done so (careful!):
rm -rf ~/R/x86_64-pc-linux-gnu-library/3.1
I’m currently rather sleep-deprived and prone to doing stupid things. Like this, for example:
rsync -av ~/Dropbox /path/to/backup/directory/
where the directory
/path/to/backup/directory already contains a much-older Dropbox directory. So when I set up a new machine, install Dropbox and copy the Dropbox directory back to its default location – hey! What happened to all my space? What are all these old files? Oh wait…I forgot to delete:
rsync -av --delete ~/Dropbox /path/to/backup/directory/
Now, files can be restored of course, but not when there are thousands of them and I don’t even know what’s old and new. What I want to do is restore the directories under ~/Dropbox to the state that they were in yesterday, before I stuffed up.
Luckily Chris Clark wrote dropbox-restore. It does exactly what it says on the tin. For example:
python restore.py /Camera\ Uploads 2014-07-22
Over the years, I’ve written a lot of small “utility scripts”. You know the kind of thing. Little code snippets that facilitate research, rather than generate research results. For example: just what are the fields that you can use to qualify Entrez database searches?
Typically, they end up languishing in long-forgotten Dropbox directories. Sometimes, the output gets shared as a public link. No longer! As of today, “little code snippets that do (hopefully) useful things” have a new home at Github.
Also as of today: there’s not much there right now, just the aforementioned Entrez database code and output. I’m not out to change the world here, just to do a little better.
A “quilt plot”
Quilt plots. Sounds interesting. The link points to a short article in PLoS ONE
, containing a table and a figure. Here is Figure 1.
If you looked at that and thought “Hey, that’s a heat map!”, you are correct. That is a heat map. Let’s be quite clear about that. It’s a heat map.
So, how do the authors justify publishing a method for drawing heat maps and then calling them “quilt plots”?
Read the rest…