Tag Archives: data integration

Algorithms running day and night

Warning: contains murky, somewhat unstructured thoughts on large-scale biological data analysis

Picture this. It’s based on a true story: names and details altered.

Alice, a biomedical researcher, performs an experiment to determine how gene expression in cells from a particular tissue is altered when the cells are exposed to an organic compound, substance Y. She collates a list of the most differentially-expressed genes and notes, in passing, that the expression of Gene X is much lower in the presence of substance Y.

Bob, a bioinformatician in the same organisation but in a different city to Alice, is analysing a public dataset. This experiment looks at gene expression in the same tissue but under different conditions: normal compared with a disease state, Z Syndrome. He also notes that Gene X appears in his list – its expression is much higher in the diseased tissue.

Alice and Bob attend the annual meeting of their organisation, where they compare notes and realise the potential significance of substance Y in suppressing the expression of Gene X and so perhaps relieving the symptoms of Z syndrome. On hearing this the head of the organisation, Charlie, marvels at the serendipitous nature of the discovery. Surely, he muses, given the amount of publicly-available experimental data, there must be a way to automate this kind of discovery by somehow “cross-correlating” everything with everything else until patterns emerge. What we need, states Charlie, is:

Algorithms running day and night, crunching all of that data

What’s Charlie missing?
Read the rest…

Has our quest for completeness made things too complicated?

In my opinion, yes. Let me elaborate.

My current job is very much focused on “data integration”. What this means is that we have a large amount of diverse data from different “-omics” experiments: microarrays, protein mass spectrometry, DNA sequencing – really, whatever you like, but it’s all aimed at answering the same question. Namely: which of these biological entities (transcripts, proteins, metabolites) are markers for various human disease states?
Read the rest…