I work with people who generate a lot of microarray data. One question that they often ask is: can we find those genes with a two-fold or more change in median expression under two or more different conditions?
For example, let’s say that we have 3 conditions: “normal”, “adenoma” and “cancer”. That gives us 3 pairwise comparisons: normal-adenoma, normal-cancer and adenoma-cancer. Here’s a Ruby solution to the problem.
First, I installed StatArray, a Ruby gem that provides statistical methods for array objects. It’s not been updated for 3 years, but seems to work.
require 'rubygems' require 'statarray'
Next, credit to David Burger for posting this code solution for combinations in Ruby. You give it an array of elements and the number of elements (r) that you want to see in each combination; it returns arrays of length (r) with each combination. I’ve just wrapped it in a class called Combination:
class Combination def generate_combinations(array, r) n = array.length indices = (0...r).to_a final = (n - r...n).to_a while indices != final yield indices.map {|k| array[k]} i = r - 1 while indices[i] == n - r + i i -= 1 end indices[i] += 1 (i + 1...r).each do |j| indices[j] = indices[i] + j - i end end yield indices.map {|k| array[k]} end end
So, let’s represent our covariates (normal, adenoma, cancer) as hash keys and their expression values, from several samples, as arrays which are the hash values. Then, we create a new hash with the same keys where the values are medians, calculated using methods from StatArray. Here, I’m using very silly dummy values for expression, which are supposed to be log base 2 values:
values = {"normal" => [1,2,3,4], "adenoma" => [5,6,7,8], "cancer" => [9,10,11,12]} median = Hash.new values.each_pair {|k,v| median[k] = v.to_statarray.median }
I’m sure that there’s a more elegant way to map the values hash to the medians hash, but that’s for another day.
Finally, we generate all possible combinations of two covariates, subtract their median values and convert to an absolute value. If we’re using log base 2, then a change of one unit = a two-fold change in expression:
c = Combination.new c.generate_combinations(median.keys,2) do |x| puts "abs(#{x.join(" - ")}) = #{(median[x[0]] - median[x[1]]).abs}" end
Result:
abs(normal - adenoma) = 4.0 abs(normal - cancer) = 8.0 abs(adenoma - cancer) = 4.0
Fin.