50% bananas


Today in “blog posts that have spent two years in the draft folder” – “Humans are 50% banana.”

“Humans are 50% banana.”

Perhaps you have heard this statement, or one like it. It seems to be widely-quoted. As an example it’s hard to go past this article from UK tabloid The Mirror which, in addition to the banana, also informs us that “the entire internet weighs about the same as one large strawberry”. I don’t even know where to begin with that one.

A couple of years ago, between jobs and with time on my hands, I thought I’d go in search of the source for this factoid.

I started with Twitter of course:

1. Source of the quote

Lots of interest but not many leads. Lars reminded me that the quote seems to originate with geneticist and author Steve Jones. It appears to have been used by him and others in various forms over time, but the oldest and most comprehensive comes from 2002 and goes like this:

We also share about 50% of our DNA with bananas and that doesn’t make us half bananas, either from the waist up or the waist down.

It’s worth noting that this was not long after the initial publication of the human genome, and certainly long before any banana genome sequences were available. We should also note that it’s intended to be humorous.

2. What does “we share 50% of our DNA” really mean?

Taken at face value, the above statement suggests that half of our total DNA nucleotide sequence aligns to equivalent regions in the banana genome. When we examine genome sizes, this clearly cannot be the case. The data below comes from NCBI Genomes.

Organism Chromosomes Genome size (Mb)
Homo sapiens (human) n = 23 2994.61 (median)
Musa acuminata (dwarf banana) n = 11 472.231

Perhaps “we share 50% of our DNA” means that if we mapped human transcripts to the banana genome (or vice-versa), the mapped length would equal about 50% of the total transcript length. Might be fun – but no-one wants to do that for a dumb blog post (right?)

In “the old days”, DNA similarity between organisms was estimated by heating DNA from each until the strands separated, then figuring out how much of the DNA from species A stuck to that from species B. So far as I know, no-one has done that using humans and bananas. I’d love to be wrong.

In the absence of information let’s create our own definition. Let’s say that 50% of human protein-coding genes have at least one ortholog in banana. Orthologs, you’ll recall, are genes in different species, derived from a common ancestral gene during speciation and (normally) retaining the same function.

3. In search of human-banana orthologs

There are essentially two ways that we might find orthologs in bananas. The computational approach is “reciprocal best hits” – align banana protein sequences to human protein sequences, and human protein sequences to banana protein sequences, then use some criteria to select the best pairs. Again that might be fun for research, but not for blogging.

The second method is “see if someone has already done that for you.” When I started looking at this long ago, good online databases of orthologs from a comprehensive range of organisms were few and far between. I’ve since discovered the OMA (Orthologous MAtrix) browser. It makes the process as simple as visiting this page, searching for human and banana and clicking “get pairs”.

Result: a tab-delimited file containing (at this time) 10 764 rows. Column 1 contains the human protein IDs. There are duplicates of course, because a human gene may have multiple banana orthologs. So now the moment you have waited for: the number of unique human protein-coding gene IDs that have an ortholog in banana, according to OMA, is…

…3 440.

Which, given that there are twenty thousand-ish human protein coding genes, equates to around “17% banana”.

So there you have it. Unless there’s a clever definition for which “humans are 50% banana” holds true – in which case, knock yourselves out in the comments.