Visit this URL and you’ll find a perfectly-formatted CSV file containing information about recent earthquakes. A nice feature of R is the ability to slurp such a URL straight into a data frame:
quakes <- read.csv("http://neic.usgs.gov/neis/gis/qed.asc", header = T) colnames(quakes) # [1] "Date" "TimeUTC" "Latitude" "Longitude" "Magnitude" "Depth" # number of recent quakes nrow(quakes) # [1] 3135 # biggest recent quake subset(quakes, quakes$Magnitude == max(quakes$Magnitude, na.rm = T)) # Date TimeUTC Latitude Longitude Magnitude Depth # 2060 2010/02/27 06:34:14.0 -35.993 -72.828 8.8 35
I hear a lot about the “web of data” and the “linked data web” but honestly, I’ll be happy the day people start posting data as delimited, plain text instead of HTML and PDF files.
Pingback: “The next big thing”, R, and Statistics in the cloud | R-statistics blog
Good point. It’d be great if more researchers followed this advice.
Plain-text is the simple, open, future proof, and interoperable option.
Guys, Linked Data is not about rectangular matrices.
So, what is Linked Data about then, you might wonder… well, try doing the following with CSV tables:
– find the number of people living within 10km of the location of the earth quake
– link that to the average income per person
– and who in your friends who published a paper together with someone within that 10km range
Is it starting to make sense? Hints: unique identifiers, uniform API.
Oh, and great example of the power of R!
I’m not criticising (or saying anything at all) about linked data – and I’m quite aware of its uses. My point is, most providers haven’t even solved the simple problem of serving regular data.
Understood, agreed, and supported.
Nice post – thanks.
You get the data you work on as tables in HTML or PDF files? Oh, I dream of getting data as HTML or PDF tables. I get them as boxes full of paper that need to be scanned, OCRed, and text-mined before I have anything that begins to resemble a table!
I know that it may sound like I am aspiring to become the fifth Yorkshireman, but I am involved in a project where this how the data were provided.