Category Archives: open science

Converting a spreadsheet of SMILES: my first OSM contribution

I’ve long admired the work of the Open Source Malaria Project. Unfortunately time and “day job” constraints prevent me from being as involved as I’d like.

So: I was happy to make a small contribution recently in response to this request for help:

Read the rest…

“Open”: motivation versus definition

Tweet length: 140 characters. Quote + URL that I wanted to tweet: 160 characters. Solution: brief blog post.

the probability that people who can help each other can be connected has risen to the point that for many types of problem that they actually are

Please read the rest of Cameron’s thoughts on motivations for openness in research: Open is a state of mind.

Lots of “open goodness” in the AU/NZ region

January/February are exciting months for open [data|research|science|access] proponents in our region – by which I mean Australia and New Zealand.

First, we’ve enjoyed a speaking tour by Sir Tim Berners-Lee, during which he discussed the benefits of open data several times. I was able to attend two events in Sydney in person and a third, linux.conf.au, by video stream. The events were the work of many people but in particular, Pia Waugh. Go follow her on Twitter, now.

Next – I wish I had been able to get to this one – the Open Research Conference on February 6-7, University of Auckland. I’m enjoying the high-quality live stream right now. Flying the flag for Sydney are Mat and Alex.

Not strictly under the “open” umbrella but worth a mention anyway: software carpentry is in town, February 7-8, just up the road from me at Macquarie University. Looking forward to hearing some reports from that.

Reproducibility: releasing code is just part of the solution

This week in Retraction Watch: Hypertension retracts paper over data glitch.

The retraction notice describes the “data glitch” in question (bold emphasis added by me):

…the authors discovered an error in the code for analyzing the data. The National Health and Nutrition
Examination Survey (NHANES) medication data file had multiple observations per participant and
was merged incorrectly with the demographic and other data files. Consequently, the sample size was
twice as large as it should have been (24989 instead of 10198). Therefore, the corrected estimates of
the total number of US adults with hypertension, uncontrolled hypertension, and so on, are significantly
different and the percentages are slightly different.

Let’s leave aside the observation that 24989 is not 2 x 10198. I tweeted:

Not that simple though, is it? Read on for the Twitter discussion.
Read the rest…

My day out at #osddmalaria

Finally, I get around to telling you that…
…on Friday 24th February, I took a day out from my regular job to attend a meeting on Open Source Drug Discovery for Malaria. I should state straight away that whilst drug discovery and chem(o)informatics are topics that I find very interesting, I have no professional experience or connections in either area. However, it was an opportunity to learn more, listen to some great speakers, think about what bioinformaticians might be able to bring to the table and of course, finally meet Mat Todd in person. Mat, if you don’t know, is one of the few people on the planet who really does science online, as opposed to talking about science online.

Here’s what I learned – with just a little analysis using R later in the post, hence the statistics/R category.
Read the rest…

Can a journal make a difference? Let’s find out.

Academic journals. Frankly, I’m not a big fan of any of them. There are too many. They cost too much. Much of what they publish is inconsequential, read by practically no-one or just downright incorrect. Much of the rest is badly-written and boring. The people who publish them have an over-inflated sense of their own importance. They’re hidden behind paywalls. And governed by ludicrous metrics. The system by which articles are accepted or rejected is arcane and ridiculous. I mean, I could go on…

No, what really troubles me about journals is that they only tell a very small part of the story – the flashy, attention-grabbing part called “results”. We learn from high school onwards that a methods section should be sufficient for anyone to reproduce the results. This is one of the great lies of science. Go read any journal in your field and give it a try. It’s even the case in computation, an area which you might think less prone to the problems in reproducing wet-lab science (“the Milli-Q must have been off”).

We have this wonderful thing called the Web now. The Web doesn’t have a page limit, so you can describe things in as much detail as you wish. Better still, you can just post your methods and data there in full, for all to see, download and reproduce to their hearts content. You’d like some credit for doing that though, right?

So if you do research – any kind of research – that involves computation, your code is open-source, reusable, well-documented and robust (think: tests) and you want to share it with the world, head over to a new journal called BMC Open Research Computation, which is now open for submissions. Your friendly team of enlightened editors awaits.

More information at Science in the Open and Saaien Tist. Full disclosure: I’m on the editorial board of this journal and was invited to write a launch post.

One reason why scientists don’t comment at journals

This week, Nature announced a new online commenting facility and noted that:

Online discussions about our research papers are likely to be considerably more subdued, according to the experience of other publishers who already allow commenting.

They offer several reasons why this might be the case. Here’s one: it helps if you make it easy.
Read the rest…

Wikification: thinking in public

Over the last 3 years, I’ve stored many small snippets of information in a set of Google Notebooks. Sample topics: notes for blog posts, programming skills that I’d like to learn and preliminary (or half-baked) ideas for research or software projects. I’ve learned that:

  • Whilst Google Notebook is great for scraping information from web pages, it leaves a lot to be desired in terms of editing and presentation
  • Ideas left in private notebooks quickly become dead ideas

Yes you can publicise, tag and collaborate at a Google Notebook, but this doesn’t fit with my workflow – or that of many others, I suspect. So, I’ve taken as much of the material as I want to make public and dumped it on a wiki at Wikidot.com. By the way, if you’re looking for a free hosted wiki with plenty of features, you could do a lot worse.

If anything there interests you enough to add material, let me know and I’ll invite you as an editor (you’ll need to create a wikidot account if you don’t have one).

Uninspired? Attend PSB 2009 (virtually)

Christmas break too short? More tired after the holiday than you were before? Perhaps you’d like to be on a Pacific atoll; Hawaii, for example.

Cheer up – attend the 2009 Pacific Symposium on Biocomputing via the magic of FriendFeed. They’ve already had an excellent session on open science (more details here) and the fun continues through to January 9.

Thanks to Shirley, Cameron et al. for the virtual proceedings. Oh, and a belated happy New Year.