Networks – social and biological – are all the rage, just now. Indeed, a recent entry at Duncan’s QOTD described the “hairball” network representation as the dominant cultural icon in molecular biology.
I’ve not had occasion to explore networks “professionally”, but have always been fascinated by both networks and the tools used to analyse them. My grasp of graph theory, the mathematics behind networks, is more or less summarised by this Wikipedia page. I’ve also been exploring the igraph library and thought I’d share a few of my “experiments with igraph”. As I say, I’m learning myself as I go along, so none of this should be taken as professional advice.
Let’s start with my favourite network – FriendFeed of course – and ask a few questions about everyone’s favourite group, The Life Scientists (TLS).
Read the rest…
Here’s the situation. Web applications, built using a framework (e.g. Rails, Django) are great for fetching data from a database and rendering it. They’re not so great for crunching and charting the data. Conversely, R is great for crunching and charting, but doesn’t make for a great web application.
Index view for values
The idea then, is to let each do what it does best and enable the passing of data between them. There isn’t a whole lot of literature on this topic, but there are a couple of guides:
- Slide 46 of this presentation by Mike Driscoll of Dataspora illustrates a different approach, where a Django-based web application sends data to RApache in CSV format.
As a first step in understanding all of this, we can build a small demo application using Rails (version 2.3.5), which serves both JSON and CSV. We’ll see if we can get that into R, then see if R can return results back to Rails, via RApache. Baby steps, so we’ll avoid the AJAX stuff for now and just use Rails rendering methods to serve JSON from a controller.
Read the rest…
Visit this URL and you’ll find a perfectly-formatted CSV file containing information about recent earthquakes. A nice feature of R is the ability to slurp such a URL straight into a data frame:
quakes <- read.csv("http://neic.usgs.gov/neis/gis/qed.asc", header = T)
#  "Date" "TimeUTC" "Latitude" "Longitude" "Magnitude" "Depth"
# number of recent quakes
#  3135
# biggest recent quake
subset(quakes, quakes$Magnitude == max(quakes$Magnitude, na.rm = T))
# Date TimeUTC Latitude Longitude Magnitude Depth
# 2060 2010/02/27 06:34:14.0 -35.993 -72.828 8.8 35
I hear a lot about the “web of data” and the “linked data web” but honestly, I’ll be happy the day people start posting data as delimited, plain text instead of HTML and PDF files.
How can I make a graph that looks like this, “tweet density” style, showing time intervals?
He then helpfully describes his input data: a CSV file with headers “time started, time finished, date”.
Read the rest…