Archive for April, 2011

April 21, 2011

Lists of URLs are so 1990s

Subtitle: “Why some projects are not worth your valuable time and skills.”

Let’s wrap up this exploration of how to extract URLs associated with NAR Database articles. I’m tempted to start with the summary: don’t bother – just Google it. If you want that, skip to the end.
Read the rest…

Tags: ,
April 20, 2011

Factoids (and using R as a simple calculator)

Wikipedia defines factoid as “a questionable or spurious—unverified, incorrect, or fabricated—statement presented as a fact, but with no veracity.”

Last night I was enjoying a TV documentary series, The Story of Science, when I heard a startling factoid, namely:

If the “empty space” inside the atoms that make up people were removed, the entire human population would fit inside a sugar cube.

What the? Can we improve the veracity of this factoid?

April 19, 2011

Why can’t PubMed or academic journals get the basics right?

A recent question at BioStar asked “Is the NAR database list available in a computer readable format?” The short answer is “no” and Pierre has done some excellent preliminary work to address the issue.

I’ve been working on a database and web application to check the associated URLs but quite frankly, this is tedious, a waste of everyone’s time and could be entirely avoided if the publishing industry did a better job. All that’s required is that either NAR or PubMed provide structured data – XML, Medline format, I don’t care what – containing a field that looks something like this:

URL    http://a.valid.url.goes.here

That way, we could all avoid writing regular expressions to detect URLs in abstracts. No wait – to detect broken URLs in abstracts. You would not believe how many of them look like this:

URL    http://www.amaze.ulb. ac.be/
                            ^

Someone helpfully informed me via Twitter that this is “often a result of typesetting.” Thanks for that.

April 15, 2011

R 2.12 to 2.13 package upgrade

If you:

  • use Linux
  • have just upgraded your R installation from 2.12 to 2.13
  • installed some/all of your packages in your home area (e.g. ~/R/i486-pc-linux-gnu-library/2.12) and…
  • …are wondering why R can’t see them any more

just do this:

# at a shell prompt
cp -r ~/R/i486-pc-linux-gnu-library/2.12 ~/R/i486-pc-linux-gnu-library/2.13
# in R console
update.packages(checkBuilt=TRUE, ask=FALSE)
# back to the shell
rm -rf ~/R/i486-pc-linux-gnu-library/2.12

update: corrected a typo; of course you need “cp -r”

April 8, 2011

Fixing aberrant files using R and the shell: a case study

Once in a while, you embark on what looks like a simple computational procedure only to encounter frustration very early on. “I can’t even read my file into R!” you cry.

Step back, take a deep breath and take note of what the software is trying to tell you. Most times, you’ve just missed something very straightforward. Here’s an example.

Update: this post is not about how best to perform the task; it’s about how to cope with frustration. Please stop sending me your solutions :-)

Tags: , , , ,
Follow

Get every new post delivered to your Inbox.

Join 2,202 other followers