Archive for April 15th, 2010

April 15, 2010

I’d be more than happy with the unlinked data web

Visit this URL and you’ll find a perfectly-formatted CSV file containing information about recent earthquakes. A nice feature of R is the ability to slurp such a URL straight into a data frame:

quakes <- read.csv("http://neic.usgs.gov/neis/gis/qed.asc", header = T)
colnames(quakes)
# [1] "Date"      "TimeUTC"   "Latitude"  "Longitude" "Magnitude" "Depth"
# number of recent quakes
nrow(quakes)
# [1] 3135
# biggest recent quake
subset(quakes, quakes$Magnitude == max(quakes$Magnitude, na.rm = T))
#            Date    TimeUTC Latitude Longitude Magnitude Depth
# 2060 2010/02/27 06:34:14.0  -35.993   -72.828       8.8    35

I hear a lot about the “web of data” and the “linked data web” but honestly, I’ll be happy the day people start posting data as delimited, plain text instead of HTML and PDF files.

Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 1,464 other followers