Git for bioinformaticians at the Bioinformatics FOAM meeting

Last week, I attended the annual Computational and Simulation Sciences and eResearch Conference, hosted by CSIRO in Melbourne. The meeting includes a workshop that we call Bioinformatics FOAM (Focus On Analytical Methods). This year it was run over 2.5 days (up from the previous 1.5 by popular request); one day for internal CSIRO stuff and the rest open to external participants.

I had the pleasure of giving a brief presentation on the use of Git in bioinformatics. Nothing startling; aimed squarely at bioinformaticians who may have heard of version control in general and Git in particular but who are yet to employ either. I’m excited because for once I am free to share, resulting in my first upload to Slideshare in almost 4.5 years. You can view it here, or at the Australian Bioinformatics Network Slideshare, or in the embed below.


3 thoughts on “Git for bioinformaticians at the Bioinformatics FOAM meeting

  1. cotsapas

    So, you’ve finally convinced me to stop just editing scripts :) I’m using git on locally only, with one repository per project. I still haven’t figured out if I need a “remote” repository (e.g. another local dir) to push to. I don’t think so, unless it’s a cross-project codebase sort of thing.

    What’s your feeling about large datasets in git? My project directories usually have src/ bin/ data/ analyses/ log/ dirs within them, with (sometimes large) data living in data/ .

    1. nsaunders Post author

      I believe that Git can handle “quite large” files, but in general, I don’t believe in versioning data. I take the approach that if scripts operate on (unchanging) input files to generate output, then versions of output data can always be regenerated from versions of scripts.

      I initialize git in the top level directory of a project and add sub-directories which don’t contain code (data, scratch etc.) to my .gitignore file.

Comments are closed.