Archive for ‘computing’

February 9, 2011

Algorithms running day and night

Warning: contains murky, somewhat unstructured thoughts on large-scale biological data analysis

Picture this. It’s based on a true story: names and details altered.

Alice, a biomedical researcher, performs an experiment to determine how gene expression in cells from a particular tissue is altered when the cells are exposed to an organic compound, substance Y. She collates a list of the most differentially-expressed genes and notes, in passing, that the expression of Gene X is much lower in the presence of substance Y.

Bob, a bioinformatician in the same organisation but in a different city to Alice, is analysing a public dataset. This experiment looks at gene expression in the same tissue but under different conditions: normal compared with a disease state, Z Syndrome. He also notes that Gene X appears in his list – its expression is much higher in the diseased tissue.

Alice and Bob attend the annual meeting of their organisation, where they compare notes and realise the potential significance of substance Y in suppressing the expression of Gene X and so perhaps relieving the symptoms of Z syndrome. On hearing this the head of the organisation, Charlie, marvels at the serendipitous nature of the discovery. Surely, he muses, given the amount of publicly-available experimental data, there must be a way to automate this kind of discovery by somehow “cross-correlating” everything with everything else until patterns emerge. What we need, states Charlie, is:

Algorithms running day and night, crunching all of that data

What’s Charlie missing?
Read the rest…

January 27, 2011

APIs have let me down part 2/2: FriendFeed

In part 1, I described some frustrations arising out of a work project, using the Array Express API. I find that one way to deal mentally with these situations is to spend some time on a fun project, using similar programming techniques. A potential downside of this approach is that if your fun project goes bad, you’re really frustrated. That’s when it’s time to abandon the digital world, go outside and enjoy nature.

Here then, is why I decided to build another small project around FriendFeed, how its failure has led me to question the value of FriendFeed for the first time and why my time as a FriendFeed user might be up.
Read the rest…

Tags: , ,
January 27, 2011

APIs have let me down part 1/2: ArrayExpress

The API – Application Programming Interface – is, in principle, a wonderful thing. You make a request to a server using a URL and back come lovely, structured data, ready to parse and analyse. We’ve begun to demand that all online data sources offer an API and lament the fact that so few online biological databases do so.

Better though, to have no API at all than one which is poorly implemented and leads to frustration? I’m beginning to think so, after recent experiences on both a work project and one of my “fun side projects”. Let’s start with the work project, an attempt to mine a subset of the ArrayExpress microarray database.
Read the rest…

January 27, 2011

Does your LinkedIn Map say anything useful?

LinkedIn, the “professional” career-oriented social network, is one of those places on the Web where I maintain a profile for visibility. I’m yet to gain any practical value whatsoever from it. That said, I know plenty of people who do find it useful – mostly, it seems, those living near the north-east or west coast of the USA.

inmap

My LinkedIn Network


LinkedIn have something of a reputation for innovation – see LinkedIn Labs, their small demonstration products, for example. The latest of these is named InMaps. It’s been popping up on blogs and Twitter for several days. Essentially, it creates a graph of your LinkedIn network, applies some community detection algorithm to cluster the members and displays the results as a pretty, interactive graphic that you can share.

What seems to have captured the imagination is that the graphs indicate communities that are instantly recognisable to the user. There’s mine on the right (click for full-size version). It’s not a large, complex or especially interesting network but when I “eyeballed” it, I was immediately able to classify the three sub-graphs:

  • Orange – mostly people with whom I have worked or currently work, plus a few “random” contacts: note that this group is hardly interconnected at all
  • Green – people who work in bioinformatics or computational biology, particularly genomics: two major hubs connect me with this group
  • Blue – the largest, densest network is composed largely of what I’d call the “BioGang”: people that I interact with on Twitter and FriendFeed, many of whom I haven’t met in person

This confirms what I’ve long suspected: I prefer to network with smart strangers than my immediate peers and colleagues. Or as Bill Joy said, “no matter who you are, most of the smartest people work for someone else.” I’ve seen this misquoted as “where you are”, which makes more sense to me.

Tags: ,
December 13, 2010

A timely reminder to use strong passwords

You may have read about a security breach at Gawker Media, the company behind several websites including Lifehacker.

The server files have been posted at various locations around the web, so I thought I’d take a look. Finding your own email address and decrypted password in a file obtained online is a sobering experience, I can tell you. Fortunately, it was not a password that I use elsewhere, so no damage done. It was, however, a ridiculously “soft” password (all digits, if you must know).

Of course, my thoughts soon turned to data analysis. A quick and dirty bash one-liner reveals the top 10 passwords…
Read the rest…

December 13, 2010

Can a journal make a difference? Let’s find out.

Academic journals. Frankly, I’m not a big fan of any of them. There are too many. They cost too much. Much of what they publish is inconsequential, read by practically no-one or just downright incorrect. Much of the rest is badly-written and boring. The people who publish them have an over-inflated sense of their own importance. They’re hidden behind paywalls. And governed by ludicrous metrics. The system by which articles are accepted or rejected is arcane and ridiculous. I mean, I could go on…

No, what really troubles me about journals is that they only tell a very small part of the story – the flashy, attention-grabbing part called “results”. We learn from high school onwards that a methods section should be sufficient for anyone to reproduce the results. This is one of the great lies of science. Go read any journal in your field and give it a try. It’s even the case in computation, an area which you might think less prone to the problems in reproducing wet-lab science (“the Milli-Q must have been off”).

We have this wonderful thing called the Web now. The Web doesn’t have a page limit, so you can describe things in as much detail as you wish. Better still, you can just post your methods and data there in full, for all to see, download and reproduce to their hearts content. You’d like some credit for doing that though, right?

So if you do research – any kind of research – that involves computation, your code is open-source, reusable, well-documented and robust (think: tests) and you want to share it with the world, head over to a new journal called BMC Open Research Computation, which is now open for submissions. Your friendly team of enlightened editors awaits.

More information at Science in the Open and Saaien Tist. Full disclosure: I’m on the editorial board of this journal and was invited to write a launch post.

November 16, 2010

Dropbox tip continued: convert a file tree to HTML

A couple of posts ago, I outlined a small bash script to generate an index.html file, containing links to other files in a directory. This was for generating links to files in a Dropbox public directory.

I had completely forgotten about the very useful UNIX/Linux command named tree. If not installed, it should be in your distribution repository (e.g. sudo apt-get install tree for Ubuntu/Debian). Then simply:

cd Dropbox/Public/mydirectory
tree -H . > index.html
Next, navigate to index.html at the Dropbox website and you should see something like the tree on the right. It’s a little ugly and obviously, not as convenient as something like Github, but can be a good quick and dirty fix if you need to share a hierarchy of directories and files.
Tags: , ,
November 9, 2010

How open source and BioStar saved a project

This is the story of how an open source project and a science communication tool combined to save the day.
Read the rest…

November 9, 2010

A quick Bash tip: add an index.html file to a Dropbox public folder

You know that Dropbox is terrific, of course. No? Go and check it out now.

One issue: files in your Public folder have a public URL, that you can send to other people. Unfortunately, directories do not. So how do you share a public directory full of files?

Answer: create an index.html file and share that. Let’s say that your files end in “.txt” and reside in ~/Dropbox/Public/entrez. Do this:

cd ~/Dropbox/Public/entrez
echo "<ol>" > index.html
for i in `ls *.txt`; do echo "<li><a href='$i'>$i</a></li>" >> index.html; done
echo "</ol>" >> index.html

Now you can share the link to the index.html, which when clicked will display a list of links to all the other files in the directory.

Tags: , ,
October 30, 2010

Findings increasingly novel, scientists say…

…was the tongue-in-cheek title of an image that I posted to Twitpic this week. It shows the usage of the word “novel” in PubMed article titles over time. As someone correctly pointed out at FriendFeed, it needs to be corrected for total publications per year.

It was inspired by a couple of items that caught my attention. First, a question at BioStar with the self-explanatory title Locations of plots of quantities of publicly available biological data. Second, an item at FriendFeed musing on the (over?) use of the word “insight” in scientific publications.

I’m sure that quite recently, I’ve read a letter to a journal which analysed the use of phrases such as “novel insights” in articles over time, but it’s currently eluding my search skills. So here’s my simple roll-your-own approach, using a little Ruby and R.
Read the rest…

Follow

Get every new post delivered to your Inbox.

Join 2,301 other followers