BioStar users (of the world, unite)

Egon writes:

Can someone please plot the BioStar users on a Google Map?

Sounds like a challenge. Let’s go.

1. Harvesting user IP addresses
BioStar user profiles (here’s mine) include a location field. It’s free text and optional, which means that location is missing or inaccurate for many users. However, if you’re logged into BioStar (and perhaps, if you’re a moderator – I’m not sure), you’ll see a field that says:

Last activity: 4 hours ago from XXX.XXX.XXX.XXX

where “XXX.XXX.XXX.XXX” is either an IP address or, for your own page, the text “this IP address” (assuming your latest activity was from your current machine).

IP addresses can be used for geolocation – we’ll see how shortly. The problem is that they are only present when logged into BioStar, which uses OpenID for authentication. So to write code which automates the collection of user IP addresses, you’d have to convince BioStar that you were logged in.

I’m sure that it’s possible to write code which stores OAuth credentials and sends them to BioStar, but it would take some time to develop. So instead, I used a very ugly and largely manual approach. First, I wrote this simple Greasemonkey script:

// ==UserScript==
// @name           BioStar IP
// @namespace      http://twitter.com/neilfws
// @description    Get user IP
// @include        http://biostar.stackexchange.com/users/*
// ==/UserScript==

var d;
d = document.evaluate("//div[@class='summaryinfo']",
                      document,
                      null,
                      XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                      null);

console.log(d.snapshotItem(0).innerHTML);

It captures the content of the DIV with class summaryinfo and writes it to the Javascript console. That content looks something like this:

Last activity: <span title="2010-10-03 23:06:52Z UTC" class="relativetime">Oct 3 at 23:06</span> from XXX.XXX.XXX.XXX

Again, XXX.XXX.XXX.XXX is the IP address.

So I opened Firefox, installed the Greasemonkey and Firebug extensions, installed my user script, navigated to the BioStar users page, opened the Firebug console and started clicking through users. By choosing “Persist” and increasing the console log limit, I was able to record the IP address of each user in the console. When finished, I copied the console contents to a text file.

There is no worse solution, for a bioinformatician, than one that involves manual labour, copy and paste. Currently, there are 17 pages of users (16 x 35 + 1 x 11 = 571 total). My file contains 567 of them: at least one did not display an IP address and perhaps I missed a couple. This is why we learn to script.

2. Location using GeoIP
So how do we find location using IP? The answer is GeoIP.

First, head over to the MaxMind website and download their GeoIP C API. I installed it (for Ubuntu) like so:

wget http://geolite.maxmind.com/download/geoip/api/c/GeoIP.tar.gz
tar zxvf GeoIP.tar.gz
cd GeoIP-1.4.6
./configure --prefix=/opt/GeoIP
make
sudo make install
# install the city database
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
gunzip GeoLiteCity.dat.gz
sudo mv GeoLiteCity.dat /opt/GeoIP/share/GeoIP/

GeoIP comes with a free database of countries, located in /opt/GeoIP/share/GeoIP/GeoIP.dat. I also installed their free city database, as shown above.

Next, the Ruby gem for GeoIP:

[sudo] gem install mtodd-geoip -s http://gems.github.com/ -- --with-geoip-dir=/opt/GeoIP

Now, quick and very dirty Ruby code to read the text file containing IP addresses and look them up in the GeoIP database:

require "rubygems"
require "geoip"

ip  = "ip.txt"  # the text file containing IPs, copied from console.log
db  = GeoIP::City.new("/opt/GeoIP/share/GeoIP/GeoLiteCity.dat")

File.read(ip).each do |line|
  line.chomp
  if line =~/from\s+(\d+\.\d+\.\d+\.\d+)/
    locn = []
    lookup = db.look_up($1)
    locn.push(lookup[:country_name], lookup[:country_code], lookup[:city], lookup[:latitude], lookup[:longitude])
    puts locn.join("\t")
  end
end

That prints out a tab-delimited file, which looks like this:

United States   US  East Lansing    42.7282981872559   -84.4881973266602
Italy           IT  Rome            41.9000015258789   12.4833002090454
Portugal        PT  Fafe            41.4500007629395   -8.16670036315918
China           CN  Wuhan           30.5832996368408   114.266700744629
United States   US  Oklahoma City   35.4715003967285   -97.5189971923828
...

3. Plotting maps using R
Before we go all Google-y, let’s look at plotting geographical data using R. There are many libraries and mapping solutions, but here’s a simple script to plot our users on a world map. It requires the packages ggplot2 and maps. Assuming that the output from the Ruby script is saved in a file, biostar.tab:

library(ggplot2)
library(maps)

biostar <- read.table("biostar.tab", header = F, stringsAsFactors = F, sep = "\t")
colnames(biostar) <- c("country", "code", "city", "lat", "long")
world <- map_data("world")

png(file = "biostar.png", width = 1024, height = 768)
print(ggplot(world, aes(long, lat)) + geom_polygon(aes(group = group), fill = "darkslategrey") + geom_point(data = biostar, aes(long, lat), colour = "red"))
dev.off()
And here’s the result (click for the full-size version).
biostar

BioStar user locations

4. Plotting on a Google Map
There are many options for getting data into Google Maps. I figured that there must be a site where you can upload a simple CSV file containing latitude + longitude and display a Google Map. There is – it’s called ZeeMaps. It has many features – some free, some paid – which I’m yet to investigate fully.

For CSV upload your file requires a column headed “Name” (I chose the city in my file), plus columns of coordinates headed “Latitude” and “Longitude”. All you need to do is create a new map, upload the file and select “refresh”. Here’s the map that I created. Unfortunately, it cannot be embedded in this blog post (click image, right, for a full-size screenshot). I have no idea if that link is permanent and I suspect that anyone can make alterations to the map.
zeemaps

BioStar users at ZeeMaps

Of course, IPs can be spoofed, users move around and the location of a machine might not reflect the location of the user. However, I think it’s a more reliable geolocation approach than an arbitrary text description. Now, if I could just automate that IP-harvesting code…

8 thoughts on “BioStar users (of the world, unite)

  1. Jukka Matilainen

    Only moderators get to see the “Last activity” IP address for other users. Us mere mortals get that info only on our own user page.

    However, probably the easiest way to fetch the authenticated version of a page using a tool like wget or curl is just to copy the authentication cookie value from your browser and use that:

    curl -b “user=t=XXXXXXXXXXXXXXXXXXXXXXXXX&s=XXXXXXXXXXXXXXXXXXXXXXXXX” http://biostar.stackexchange.com/users/181

  2. Bob Muenchen

    Nice plot! I wasn’t sure what this line was doing:
    scale_colour_discrete(legend = FALSE)
    so I left it off and didn’t notice a difference.

    I’d like to use the map for a teaching example (with attribution of course). Could you please post biostar.tab?

    Thanks,
    Bob Muenchen

Comments are closed.