Web services are great. Pass them a URL. Structured data comes back. Parse it, analyse it, visualise it. Done.
Web scraping – interacting programmatically with a web page – is not so great. It requires more code and when the web page changes, the code breaks. However, in the absence of a web service, scraping is better than nothing. It can even be rather satisfying. Early in my bioinformatics career the realisation that code, rather than humans, can automate the process of submitting forms and reading the results was quite a revelation.
In this post: how to interact with a web page at the NCBI using the Mechanize library.
Since 2005, I have started almost every working day by using one Web application – an application that occupies a permanent browser tab on my work and home desktop machines. That application is Google Reader.
If you’re reading this, you’re probably aware that Google Reader will cease to exist from July 1 2013. Others have ranted, railed against the corporate machine and expressed their sadness. I thought I’d try to explain why, for this working scientist at least, RSS and feed readers are incredibly useful tools which I think should be valued highly.
I think we’re both right. Michael’s perspective is that of an expert in high-throughput sequencing data; I’m just pleased to see an introduction to bioinformatics for non-specialists in a mainstream newspaper. And I note that they have corrected the figure caption which offended Michael.
As to the “deluge”: yes, there are other sciences that generate more data and yes, we probably don’t need to archive/analyse a lot of the raw data. However, I’d contend that the basic premise of the article is correct: we are sequencing faster than we can analyse. The solution, obviously, is more bioinformaticians.
Which topics are the most popular at the BioStar bioinformatics Q&A site?
One source of data is the tags used for questions. Tags are somewhat arbitrary of course, but fortunately BioStar has quite an active community, so “bad” tags are usually edited to improve them. Hint: if your question is “How to find SNPs”, then tagging it with “how, to, find, snps” won’t win you any admirers.
OK: we’re going to grab the tags then use a bunch of R packages (XML, wordcloud and ggplot2) to take a quick look.
I wonder if part of the drop off is live bloggers moving to platforms like Twitter? I can tell you it seemed like there were almost as many tweets for one SIG (#bosc2011) as for the whole of #ISMB / #ECCB2011, and I personally didn’t post anything to FriendFeed but posted lots on Twitter.
Well, there’s a problem with using Twitter for analysis of conference coverage. Let’s try searching for ISMB-related tweets using the twitteR package:
I’ve been a strong proponent of FriendFeed since its launch. Its technology, clean interface and “data first, then conversations” approach have made it a highly-successful experiment in social networking for scientists (and other groups). So you may be surprised to hear that from today, I will no longer be importing items into FriendFeed, or participating in the conversations at other feeds.
Here’s a brief explanation and some thoughts on my online activity in the coming months. Read the rest…
In part 1, I described some frustrations arising out of a work project, using the Array Express API. I find that one way to deal mentally with these situations is to spend some time on a fun project, using similar programming techniques. A potential downside of this approach is that if your fun project goes bad, you’re really frustrated. That’s when it’s time to abandon the digital world, go outside and enjoy nature.
Here then, is why I decided to build another small project around FriendFeed, how its failure has led me to question the value of FriendFeed for the first time and why my time as a FriendFeed user might be up. Read the rest…
The API – Application Programming Interface – is, in principle, a wonderful thing. You make a request to a server using a URL and back come lovely, structured data, ready to parse and analyse. We’ve begun to demand that all online data sources offer an API and lament the fact that so few online biological databases do so.
Better though, to have no API at all than one which is poorly implemented and leads to frustration? I’m beginning to think so, after recent experiences on both a work project and one of my “fun side projects”. Let’s start with the work project, an attempt to mine a subset of the ArrayExpress microarray database. Read the rest…
LinkedIn, the “professional” career-oriented social network, is one of those places on the Web where I maintain a profile for visibility. I’m yet to gain any practical value whatsoever from it. That said, I know plenty of people who do find it useful – mostly, it seems, those living near the north-east or west coast of the USA.
My LinkedIn Network
LinkedIn have something of a reputation for innovation – see LinkedIn Labs, their small demonstration products, for example. The latest of these is named InMaps. It’s been popping up on blogs and Twitter for several days. Essentially, it creates a graph of your LinkedIn network, applies some community detection algorithm to cluster the members and displays the results as a pretty, interactive graphic that you can share.
What seems to have captured the imagination is that the graphs indicate communities that are instantly recognisable to the user. There’s mine on the right (click for full-size version). It’s not a large, complex or especially interesting network but when I “eyeballed” it, I was immediately able to classify the three sub-graphs:
Orange – mostly people with whom I have worked or currently work, plus a few “random” contacts: note that this group is hardly interconnected at all
Green – people who work in bioinformatics or computational biology, particularly genomics: two major hubs connect me with this group
Blue – the largest, densest network is composed largely of what I’d call the “BioGang”: people that I interact with on Twitter and FriendFeed, many of whom I haven’t met in person
This confirms what I’ve long suspected: I prefer to network with smart strangers than my immediate peers and colleagues. Or as Bill Joy said, “no matter who you are, most of the smartest people work for someone else.” I’ve seen this misquoted as “where you are”, which makes more sense to me.