Posts tagged ‘how to’

April 24, 2012

Redmine + Gitolite integration

I’m a big fan of both Redmine, the project management web application and Git, the distributed version control system.

Recently, I learned that it’s possible to integrate Git into Redmine so that git repositories for a project can be created via the Redmine web interface. This is done using plugins which connect Redmine with git hosting software: either gitosis or more recently, gitolite.

Unfortunately, this is a deeply-confusing process for novices like myself. There are multiple forks of the plugins, long threads in the Redmine forums that discuss various hacks/tweaks to make things work and no one authoritative source of documentation. After much experimentation, this is what worked for me. I can’t guarantee success for you.

Read the rest…

February 2, 2012

Proteins in the PDB that differ by one amino acid

A question at BioStar: how to “return all pdb ids to a given one that differ only by one amino acid”?

My answer began: “I think it is not too much work to craft a solution using a few tools”, followed by some incomplete ideas. Let’s see if I was right.
Read the rest…

December 22, 2011

Sequencing for relics from the Sanger era part 1: getting the raw data

Sequencing in the good old days

In another life, way back in the mists of time, I did a Ph.D. Part of my project was to sequence a gene from a bacterium, which encoded an enzyme involved in nitrate metabolism. It took the best part of a year to obtain ~ 2 000 bp of DNA sequence: partly because I was rubbish at sequencing, but also because of the technology at the time. It was an elegant biochemical technique called the dideoxy chain termination method, or “Sanger sequencing” after its inventor. Sequence was visualized by exposing radioactively-labelled DNA to X-ray film, resulting in images like the one at left, from my thesis. Yes, that photograph is glued in place. The sequence was read manually, by placing the developed film on a light box, moving a ruler and writing down the bases.

By the time I started my first postdoc, technology had moved on a little. We still did Sanger sequencing but the radioactive label had been replaced with coloured dyes and a laser scanner, which allowed automated reading of the sequence. During my second postdoc, this same technology was being applied to the shotgun sequencing of complete bacterial genomes. Assembling the sequence reads into contigs was pretty straightforward: there were a few software packages around, but most people used a pipeline of Phred (to call base qualities), Phrap (to assemble the reads) and Consed (for manual editing and gap-filling strategy).

The last time I worked directly on a project with sequencing data was around 2005. Jump forward 5 years to the BioStar bioinformatics Q&A forum and you’ll find many questions related to sequencing. But not sequencing as I knew it. No, this is so-called next-generation sequencing, or NGS. Suddenly, I realised that I am no longer a sequencing expert. In fact:

I am a relic from the Sanger era

I resolved to improve this state of affairs. There is plenty of publicly-available NGS data, some of it relevant to my current work and my organisation is predicting a move away from microarrays and towards NGS in 2012. So I figured: what better way to educate myself than to blog about it as I go along?

This is part 1 of a 4-part series and in this installment, we’ll look at how to get hold of public NGS data.
Read the rest…

September 8, 2011

Interacting with bioinformatics webservers using R

In an ideal world, all bioinformatics tools would be made available via the Web as a web service with an API, as well as a standalone package to download for local use. This is rarely the case and sometimes, even where one or the other is available, factors such as cost come into play. So we resort to web scraping; writing code to interact with the code that lies behind a web server so as to submit queries, retrieve and parse results.

Normally, I’d use something like Ruby’s Mechanize library for this purpose. However, where the purpose is to retrieve delimited data for analysis using R, I figured it was time to try and achieve the entire process within R. So here’s how I used the RCurl and XML packages to interact with the WHAT IF server, which provides tools for the analysis of protein structure.
Read the rest…

July 7, 2011

R: calculations involving months

Ask anyone how much time has elapsed since September last year and they’ll probably start counting on their fingers: “October, November…” and tell you “just over 9 months.”

So, when faced as I was today with a data frame (named dates) like this:

pmid1       year1    month1     pmid2      year2    month2
21355427    2010     Dec        21542215   2011     Mar
21323727    2011     Feb        21521365   2011     Jun
21297532    2011     Feb        21336080   2011     Mar
21291296    2011     Apr        21591868   2011     Jun
...

How to add a 7th column, with the number of months between “year1/month1″ and “year2/month2″?
Read the rest…

Tags: ,
April 15, 2011

R 2.12 to 2.13 package upgrade

If you:

  • use Linux
  • have just upgraded your R installation from 2.12 to 2.13
  • installed some/all of your packages in your home area (e.g. ~/R/i486-pc-linux-gnu-library/2.12) and…
  • …are wondering why R can’t see them any more

just do this:

# at a shell prompt
cp -r ~/R/i486-pc-linux-gnu-library/2.12 ~/R/i486-pc-linux-gnu-library/2.13
# in R console
update.packages(checkBuilt=TRUE, ask=FALSE)
# back to the shell
rm -rf ~/R/i486-pc-linux-gnu-library/2.12

update: corrected a typo; of course you need “cp -r”

April 14, 2010

Plotting “time of day” data using ggplot2

William asks:

How can I make a graph that looks like this, “tweet density” style, showing time intervals?

He then helpfully describes his input data: a CSV file with headers “time started, time finished, date”.
Read the rest…

September 8, 2008

On parsing

Parsing – the act of ripping through a file, pulling out the relevant parts and doing something useful with them, is an integral part of bioinformatics. It can be a dull procedure. It can also be challenging, requiring creativity and imagination. Frequently as a bioinformatician, you will generate output from an unfamiliar program, or a colleague will bring you a file that you haven’t encountered. Your task is to figure out how the file is structured, which regular expressions are required to parse it, what kind of output to produce and most importantly, how to handle those rogue files which don’t obey the rules.

Here’s my top ten (language-agnostic) parsing tips, focusing only on non-XML text files.
Read the rest…

Tags: ,
March 7, 2008

Missing links: using SwissProt IDtracker in your code

The BioPerl Bio::DB::SwissProt module lets you fetch sequences from SwissProt by ID or AC and store them as sequence objects:

use Bio::DB::SwissProt;
my $sp = Bio::DB::SwissProt->new('-servertype' => 'expasy', 'hostlocation' => 'australia');
my $seq = $sp->get_Seq_by_id('myod1_pig');

If you obtained SwissProt identifiers from a database that hasn’t been updated for some time, you may find that the ID or AC has changed. For example at NLSdb, the ID from the example shown is given as “myod_pig”. In this case, BioPerl will throw an error like this:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: id does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_id /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:154
STACK: test.pl:3
-----------------------------------------------------------

SwissProt provides a web page named IDtracker to help you find new identifiers using old ones. Here’s how we can integrate the service into Perl.
Read the rest…

February 27, 2008

How to: map protein sequence onto chromosomal coordinates using BioPerl

My coding challenge this week: given a protein sequence and its exons, how do you map single amino acid residues to a location on a DNA sequence? It’s trickier than you might think. Read on for my latest BioPerl how-to.
Read the rest…

Follow

Get every new post delivered to your Inbox.

Join 1,464 other followers