How-to: combinations of covariates using Ruby

I work with people who generate a lot of microarray data. One question that they often ask is: can we find those genes with a two-fold or more change in median expression under two or more different conditions?

For example, let’s say that we have 3 conditions: “normal”, “adenoma” and “cancer”. That gives us 3 pairwise comparisons: normal-adenoma, normal-cancer and adenoma-cancer. Here’s a Ruby solution to the problem.

First, I installed StatArray, a Ruby gem that provides statistical methods for array objects. It’s not been updated for 3 years, but seems to work.

require 'rubygems'
require 'statarray'

Next, credit to David Burger for posting this code solution for combinations in Ruby. You give it an array of elements and the number of elements (r) that you want to see in each combination; it returns arrays of length (r) with each combination. I’ve just wrapped it in a class called Combination:

class Combination
  def generate_combinations(array, r)
    n = array.length
    indices = (0...r).to_a
    final = (n - r...n).to_a
    while indices != final
      yield indices.map {|k| array[k]}
      i = r - 1
      while indices[i] == n - r + i
        i -= 1
      end
      indices[i] += 1
      (i + 1...r).each do |j|
        indices[j] = indices[i] + j - i
      end
    end
    yield indices.map {|k| array[k]}
  end
end

So, let’s represent our covariates (normal, adenoma, cancer) as hash keys and their expression values, from several samples, as arrays which are the hash values. Then, we create a new hash with the same keys where the values are medians, calculated using methods from StatArray. Here, I’m using very silly dummy values for expression, which are supposed to be log base 2 values:

values = {"normal" => [1,2,3,4], "adenoma" => [5,6,7,8], "cancer" => [9,10,11,12]}
median = Hash.new

values.each_pair {|k,v|
  median[k] = v.to_statarray.median
}

I’m sure that there’s a more elegant way to map the values hash to the medians hash, but that’s for another day.

Finally, we generate all possible combinations of two covariates, subtract their median values and convert to an absolute value. If we’re using log base 2, then a change of one unit = a two-fold change in expression:

c = Combination.new
c.generate_combinations(median.keys,2) do |x|
  puts "abs(#{x.join(" - ")}) = #{(median[x[0]] - median[x[1]]).abs}"
end

Result:

abs(normal - adenoma) = 4.0
abs(normal - cancer) = 8.0
abs(adenoma - cancer) = 4.0

Fin.