Archive for ‘research diary’

February 26, 2013

R/ggplot2 tip: aes_string

I’m a big fan of ggplot2. Recently, I ran into a situation which called for a useful feature that I had not used previously: aes_string.
Read the rest…

February 13, 2013

Basic R: rows that contain the maximum value of a variable

File under “I keep forgetting how to do this basic, frequently-required task, so I’m writing it down here.”

Let’s create a data frame which contains five variables, vars, named A – E, each of which appears twice, along with some measurements:

df.orig <- data.frame(vars = rep(LETTERS[1:5], 2), obs1 = c(1:10), obs2 = c(11:20))
df.orig
#    vars obs1 obs2
# 1     A    1   11
# 2     B    2   12
# 3     C    3   13
# 4     D    4   14
# 5     E    5   15
# 6     A    6   16
# 7     B    7   17
# 8     C    8   18
# 9     D    9   19
# 10    E   10   20

Now, let’s say we want only the rows that contain the maximum values of obs1 for A – E. In bioinformatics, for example, we might be interested in selecting the microarray probeset with the highest sample variance from multiple probesets per gene. The answer is obvious in this trivial example (6 – 10), but one procedure looks like this:
Read the rest…

February 12, 2013

Genes x Samples: please explain

One of my bioinformatics pet peeves involves statements like this one, from the CNAmet user guide:

Inputs to CNAmet are three m x n matrices, where m is the number of genes and n the number samples

What we’re looking at here is the hot, but poorly-defined topic of data integration, in which biological measurements from two or more different platforms are somehow combined in a way that provides more information than each platform separately. Read any paper on this topic, download the software and you’ll find example datasets containing two or more matched matrices, with rows where measurements have been summarized to a “gene”. What you won’t find, typically, is a detailed explanation of the summarization procedure that you could implement yourself.

Read the rest…

October 22, 2012

Gene name errors and Excel: lessons not learned

June 23, 2004. BMC Bioinformatics publishes “Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics”. We roll our eyes. Do people really do that? Is it really worthy of publication? However, we admit that if it happens then it’s good that people know about it.

October 17, 2012. A colleague on our internal Yammer network writes:
Read the rest…

August 28, 2012

Addendum to yesterday’s post on custom CSS and R Markdown


Updates from RStudio support:
(1) “Thanks for reporting and I was able to reproduce this as well. I’ve filed a bug and we’ll take a look.”
(2) Taking a further look, this is actually a bug in the Markdown package and we’ve asked the maintainer (Jeffrey Horner) to look into it.

As juejung points out in a comment on my previous post, applying custom CSS to R Markdown by sourcing the custom rendering function breaks the rendering of inline equations.

I’ve opened an issue with RStudio support and will update here with their response. In the meantime, one solution to this problem is:

  1. Do not create the files custom.css or style.R, as described yesterday
  2. Instead, just put the custom CSS at the top of your R Markdown file using style tags, as shown below
<style type="text/css">
table {
   max-width: 95%;
   border: 1px solid #ccc;
}

th {
  background-color: #000000;
  color: #ffffff;
}

td {
  background-color: #dcdcdc;
}
</style>
Tags: ,
August 27, 2012

Custom CSS for HTML generated using RStudio

People have been telling me for a while that the latest version of RStudio, the IDE for R, is a great way to generate reports. I finally got around to trying it out and for once, the hype is justified. Start with this excellent tutorial from Jeremy Anglim.

Briefly: the process is not so different to Sweave, except that (1) instead of embedding R code in LaTeX, we embed R code in a document written using R Markdown; (2) instead of Sweave, we use the knitr package; (3) the focus is on generating HTML documents for publishing to the Web (see e.g. RPubs), although knitr can also generate PDF documents, just like Sweave.

It took me a little while to figure out a couple of things. First, how best to generate HTML tables, ideally using the xtable package. Second, how to override the default RStudio/R Markdown style. I’ve documented those tasks in this post.
Read the rest…

Tags: , , ,
April 24, 2012

Redmine + Gitolite integration

I’m a big fan of both Redmine, the project management web application and Git, the distributed version control system.

Recently, I learned that it’s possible to integrate Git into Redmine so that git repositories for a project can be created via the Redmine web interface. This is done using plugins which connect Redmine with git hosting software: either gitosis or more recently, gitolite.

Unfortunately, this is a deeply-confusing process for novices like myself. There are multiple forks of the plugins, long threads in the Redmine forums that discuss various hacks/tweaks to make things work and no one authoritative source of documentation. After much experimentation, this is what worked for me. I can’t guarantee success for you.

Read the rest…

March 16, 2012

R gotcha for the week

I use the biomaRt package from Bioconductor in almost every R session. So I thought I’d load the library and set up a mart instance in my ~/.Rprofile:

library(biomaRt)
mart.hs <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

On starting R, I was somewhat perplexed to see this error message:

Error in bmVersion(mart, verbose = verbose) : 
  could not find function "read.table"

Twitter to the rescue. @hadleywickham told me to load utils first and @vsbuffalo explained that normally, .Rprofile is read before the utils package is loaded. Seems rather odd to me; I’d have thought that biomaRt should load utils if required, but there you go.

So this works in ~/.Rprofile:

library(utils)
library(biomaRt)
mart.hs <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
March 14, 2012

Simple plots reveal interesting artifacts

I’ve recently been working with methylation data; specifically, from the Illumina Infinium HumanMethylation450 bead chip. It’s a rather complex array which uses two types of probes to determine the methylation state of DNA at ~ 485 000 sites in the genome.

The Bioconductor project has risen to the challenge with a (somewhat bewildering) variety of packages to analyse data from this chip. I’ve used lumi, methylumi and minfi, but will focus here on just the latter.

Update: a related post from Oliver Hofmann (@fiamh).
Read the rest…

February 28, 2012

How-to: code snippets in your LaTeX documents

ruby

Formatted Ruby code in PDF generated from LaTeX

There are numerous ways to include formatted code in a LaTeX document. Here’s my favourite. Nothing original or clever here, just brief notes for my (and perhaps, your) benefit. Based largely on this how-to.
Read the rest…

Follow

Get every new post delivered to your Inbox.

Join 2,204 other followers