A brief message for anyone who uses my PubMed retractions report. It’s no longer available at RPubs; instead, you will find it here at Github. Github pages hosting is great, once you figure out that
docs/ corresponds to your web root :)
Now I really must update the code and try to make it more interesting than a bunch of bar charts.
If you still follow my Twitter feed – I pity you, as it’s been rather boring of late. Consisting largely of Github commit messages, many including the words “knit to github document”.
Here’s why. RPubs, an early offering from RStudio, has been a great platform for easy and free publishing of HTML documents generated from RMarkdown and written in RStudio. That said, it’s always been very basic (e.g. no way to organise documents by content, tags). There’s been no real development of the platform for several years and of late, I’ve noticed it’s become less reliable. Bugs, for example, such as one document overwriting another when published from RStudio.
I think it’s unlikely that issues will be addressed, given that RStudio are now focused on RStudio Connect. So I’ve removed as many documents as I can and rewritten them as Github documents. These render as HTML when pushed to Github, generating attractive reports. Here’s an example.
I’ve done my best to update all blog posts here with links to the new reports. If you do come across old broken links to RPubs reports, just remember that the content is probably now at Github.
After my previous post on extracting virus hosts from NCBI Taxonomy web pages, Pierre wrote:
An excellent idea and here’s my first attempt.
Here’s a count of hosts. By the way NCBI, it’s environment.
cut -f4 virus_host.tsv | sort | uniq -c
1 fungi| plants| invertebrates
181 invertebrates| plants
7 invertebrates| vertebrates
115052 vertebrates| human
43 vertebrates| human stool
225 vertebrates| invertebrates
656 vertebrates| invertebrates| human
I’m currently rather sleep-deprived and prone to doing stupid things. Like this, for example:
rsync -av ~/Dropbox /path/to/backup/directory/
where the directory
/path/to/backup/directory already contains a much-older Dropbox directory. So when I set up a new machine, install Dropbox and copy the Dropbox directory back to its default location – hey! What happened to all my space? What are all these old files? Oh wait…I forgot to delete:
rsync -av --delete ~/Dropbox /path/to/backup/directory/
Now, files can be restored of course, but not when there are thousands of them and I don’t even know what’s old and new. What I want to do is restore the directories under ~/Dropbox to the state that they were in yesterday, before I stuffed up.
Luckily Chris Clark wrote dropbox-restore. It does exactly what it says on the tin. For example:
python restore.py /Camera\ Uploads 2014-07-22
This post is an apology and an attempt to make amends for contributing to the decay of online bioinformatics resources. It’s also, I think, a nice example of why reproducible research can be difficult.
Come back in time with me 10 years, to 2004.
Continue reading →
Over the years, I’ve written a lot of small “utility scripts”. You know the kind of thing. Little code snippets that facilitate research, rather than generate research results. For example: just what are the fields that you can use to qualify Entrez database searches?
Typically, they end up languishing in long-forgotten Dropbox directories. Sometimes, the output gets shared as a public link. No longer! As of today, “little code snippets that do (hopefully) useful things” have a new home at Github.
Also as of today: there’s not much there right now, just the aforementioned Entrez database code and output. I’m not out to change the world here, just to do a little better.
I’m the “biologist-turned-programmer” type of bioinformatician which makes me a hacker, not a developer. Most of the day-to-day coding that I do goes something like this:
Colleague: Hey Neil, can you write me a script to read data from file X, do Y to it and output a table in file Z?
Me: Sure… (clickety-click, hackety-hack…) …there you go.
Colleague: Great! Thanks.
I’m a big fan of the Bio* projects and have used them for many years, beginning with Bioperl and more recently, BioRuby. And I’ve always wanted to contribute some code to them, but have never got around to doing so. This week, two thoughts popped into my head:
- How hard can it be?
- There isn’t much introductory documentation for would-be Bio* developers
The answer to the first question is: given some programming experience, not very hard at all. This blog post is my attempt to address the second thought, by writing a step-by-step guide to developing a simple class for the BioRuby library. When I say “beginner’s guide”, I’m referring to myself as much as anyone else.
Read the rest…