May 11, 2012

My day out at #osddmalaria

Finally, I get around to telling you that…
…on Friday 24th February, I took a day out from my regular job to attend a meeting on Open Source Drug Discovery for Malaria. I should state straight away that whilst drug discovery and chem(o)informatics are topics that I find very interesting, I have no professional experience or connections in either area. However, it was an opportunity to learn more, listen to some great speakers, think about what bioinformaticians might be able to bring to the table and of course, finally meet Mat Todd in person. Mat, if you don’t know, is one of the few people on the planet who really does science online, as opposed to talking about science online.

Here’s what I learned – with just a little analysis using R later in the post, hence the statistics/R category.
Read the rest…

April 24, 2012

Redmine + Gitolite integration

I’m a big fan of both Redmine, the project management web application and Git, the distributed version control system.

Recently, I learned that it’s possible to integrate Git into Redmine so that git repositories for a project can be created via the Redmine web interface. This is done using plugins which connect Redmine with git hosting software: either gitosis or more recently, gitolite.

Unfortunately, this is a deeply-confusing process for novices like myself. There are multiple forks of the plugins, long threads in the Redmine forums that discuss various hacks/tweaks to make things work and no one authoritative source of documentation. After much experimentation, this is what worked for me. I can’t guarantee success for you.

Read the rest…

April 12, 2012

Draft post cull

Work and life are currently impacting the frequency of my blogging. I’m falling back on the old trick of clearing out draft posts and explaining briefly why they never saw the light of day.

1. How to launch a website (January 18, 2012)
Anyone can publish a website. Is this a good thing? In one sense yes, of course – empowerment and democratization of information are important. However, I’m increasingly of the opinion that scientific websites should meet some minimum standards and that those who create them have an appropriate level of programming competency. Having an idea is great but if it’s poorly-implemented, tools will break, whither and die, creating a Web of broken links and non-functional resources.

So the title of this post was supposed to be ironic and it went on to list a series of steps which described (wittily I hoped) how not to launch a website. Unfortunately, when I read over the post, it was not at all witty. In fact, it was just a thinly-veiled attack on a recently launched aggregator.

I think some of the points in the post do need to be made, but not in their current form and perhaps not by me.

2. Google Plus: what’s in it for the online scientist? (July 1, 2011)
I started to write this one when Google+ first launched. There were more than enough posts which reviewed its features so, as the title suggests, I was trying to come up with a different angle: how useful might G+ be for scientists?

In the end, I simply could not come up with anything interesting to say. I know that other scientists use G+ and find it very useful. I’m not one of them – in fact, I don’t use it at all. What’s more, I can’t even explain why. It’s not that I have strong feelings one way or the other about G+; for some reason, it just doesn’t register on my “must investigate” radar.

Still to come (hopefully)

1. Sequencing for relics from the Sanger era part 2: FASTQ manipulation and sequence quality
There are supposed to be 4 posts in my NGS series and this one looks at tools for manipulation and QC of FASTQ sequence files. My NGS self-education is a side-project and “real work” takes precedence just now, so no timeline promises for this one. Soon, I hope.

2. My day out at #osddmalaria
I’ve been meaning to write a summary of my day at the Open Source Drug Discovery Malaria meeting, back in February. Here’s a brief one: it was really interesting and I enjoyed it. More than that to follow at some point.

Tags: ,
March 16, 2012

R gotcha for the week

I use the biomaRt package from Bioconductor in almost every R session. So I thought I’d load the library and set up a mart instance in my ~/.Rprofile:

library(biomaRt)
mart.hs <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

On starting R, I was somewhat perplexed to see this error message:

Error in bmVersion(mart, verbose = verbose) : 
  could not find function "read.table"

Twitter to the rescue. @hadleywickham told me to load utils first and @vsbuffalo explained that normally, .Rprofile is read before the utils package is loaded. Seems rather odd to me; I’d have thought that biomaRt should load utils if required, but there you go.

So this works in ~/.Rprofile:

library(utils)
library(biomaRt)
mart.hs <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

March 14, 2012

Simple plots reveal interesting artifacts

I’ve recently been working with methylation data; specifically, from the Illumina Infinium HumanMethylation450 bead chip. It’s a rather complex array which uses two types of probes to determine the methylation state of DNA at ~ 485 000 sites in the genome.

The Bioconductor project has risen to the challenge with a (somewhat bewildering) variety of packages to analyse data from this chip. I’ve used lumi, methylumi and minfi, but will focus here on just the latter.

Update: a related post from Oliver Hofmann (@fiamh).
Read the rest…

February 28, 2012

How-to: code snippets in your LaTeX documents

ruby

Formatted Ruby code in PDF generated from LaTeX

There are numerous ways to include formatted code in a LaTeX document. Here’s my favourite. Nothing original or clever here, just brief notes for my (and perhaps, your) benefit. Based largely on this how-to.
Read the rest…

February 21, 2012

When even your own publication list makes no sense

A few years ago, the head of my research group asked if I’d like to help write a chapter for a book. I weighed up the pros: it was an updated version of a previous book (so not too much work), it was invited (so not too many battles with reviewers) and it’s another item to go on the CV. The cons: typically, this kind of article appears in an obscure, closed publication that no-one ever reads or cites. So I said sure, why not and we wrote it.

It’s listed on my publications page at this blog as:

Saunders, N.F.W., Brinkworth, R.I., Kemp, B.E. and Kobe, B. (2010). Substrates of Cyclic Nucleotide-Dependent Protein Kinases. In: Handbook of Cell Signalling (Bradshaw, R.A., Dennis, E., eds.). Academic Press San Diego, 182:1489-1495. [DOI]

and sure enough, if you visit that DOI (and have a Science Direct subscription), you’ll find chapter 182 in the Handbook of Cell Signalling.

I thought no more about it, until I updated my Google Scholar citations page, where I found this:

Substrates of Cyclic Nucleotide-Dependent Protein Kinases
Neil FW Saunders, Ross I Brinkworth, Bruce E Kemp, Bostjan Kobe
2011/4/12
Transduction Mechanisms in Cellular Signaling: Cell Signaling Collection 399
Academic Press

And here’s the link at Google Books. Same article, same editors – but in chapter 41 of a different book: Transduction Mechanisms in Cellular Signaling: Cell Signaling Collection, on pages 399-405.

So apparently, my chapter has been “repurposed” for a completely different publication. Perhaps this transpired in consultation with the research group after I left. Perhaps there’s a long-forgotten email trail in which I agreed to this. Or perhaps we have so little control over our own work that strange things like this can just happen.

February 13, 2012

10 years on, same old same old

September 2, 2002

So what new skills will postdocs need to ensure that they don’t become science relics? The answer is math, statistics, and knowledge of a scripting language for computers.

– ­The Scientist, “Bioinformatics Knowledge Vital to Careers.” 16(17): 53.

February 8 2012

But two other skills are increasingly necessary: expertise in computer-programming languages designed to aid manipulation of large data sets, such as R, Perl or Python, and the ability to use these languages to analyse large amounts of data quickly.

– Nature, “Biostatistics: Revealing analysis.” 482: 263–265.

February 2, 2012

Proteins in the PDB that differ by one amino acid

A question at BioStar: how to “return all pdb ids to a given one that differ only by one amino acid”?

My answer began: “I think it is not too much work to craft a solution using a few tools”, followed by some incomplete ideas. Let’s see if I was right.
Read the rest…

January 27, 2012

Reproducible research: three links that made me think

I’m constantly amazed, bemused and troubled by how little published scientific research is genuinely reproducible, in that you or I (or even the original authors) could go back and check the results. Three examples from around the Web converged in my mind this week.
Read the rest…

Follow

Get every new post delivered to your Inbox.

Join 1,466 other followers