Things to know about Lorne in the state of Victoria, Australia.
- It’s situated on the Great Ocean Road, a major visitor attraction and a great way to see the scenic coastline of the region
- It’s home to a number of life science conferences including Lorne Genome 2017
This week’s project then: use R to analyse coverage of the 2017 meeting on Twitter. I last did something similar for the ISMB meeting in 2012. How things have changed. Back then I prepared PDF reports using Sweave, retrieved tweets using the
twitteR package and struggled with dates and time when plotting timelines. This time around I wrote RMarkdown in RStudio, tried out the newer rtweet package and, thanks to packages such as
lubridate, the data munging is all so much cleaner and simpler.
So without further ado here is the Github repository.
The report examines several aspects of the conference coverage under the broad headings of timeline, users, networks, retweets, favourites, quotes, media and text.
In 2015, I’d like to write, think and do more about things that I care about. One of those things happens to be the koala. Now, this being a blog about bioinformatics and computational biology, I can’t just start writing about any old thing that takes my fancy…I guess. So in this post I’m going to stretch the definition to include ecological informatics and tell you the story of how I achieved a long-held ambition using one of my favourite online resources, The Atlas of Living Australia. And then we’ll wrap up with a quick survey of the (sorry) state of marsupial genomics.
I’ve been complaining about this for years. They fixed it. The NCBI have reorganised their genomes FTP site and finally, Archaea are not lumped in with Bacteria.
Archaea are still included in the ASSEMBLY_BACTERIA directory; hopefully that’s next on the list.
[*] to be fair, they’ve always recognised Archaea – just not in a form that makes downloads convenient
6-way Venn banana
I thought nothing could top the classic “6-way Venn banana
“, featured in The banana (Musa acuminata) genome and the evolution of monocotyledonous plants
That is until I saw Figure 3 from Compact genome of the Antarctic midge is likely an adaptation to an extreme environment.
5-way Venn roadkill
What’s odd is that Figure 2 in the latter paper is a nice, clear R/ggplot2 creation, using facet_grid(), so someone knew what they were doing.
That aside, the Antarctic midge paper is an interesting read; go check it out.
This led to some amusing Twitter discussion which pointed me to *A New Rose : The First Simple Symmetric 11-Venn Diagram.
[*] +1 for referencing The Damned, if indeed that was the intention.
I enjoyed this story from the OpenHelix blog today, describing a Microsoft Research project to mine DNA sequences from web pages and map them to UCSC genome builds.
Laura DeMare asks: what was the most-hit gene?
This bioinformatician, at least. Hate is a strong word. Perhaps “dislike” is better.
Short answer: because you can’t get data out of them easily, if at all. Longer answer:
Read the rest…
I subscribe to the Ensembl blog and found, in my feed reader this morning, a post which linked to the Variant Effect Predictor (VEP). The original blog post, strangely, has disappeared.
Not to worry: so, the VEP takes genotyping data in one of several formats, compares it with the Ensembl variation + core databases and returns a summary of how the variants affect transcripts and regulatory regions. My first thought – can I apply this to my own 23andme data?
Read the rest…
While we’re on the topic of mistaking Archaea for Bacteria, here’s an issue with the NCBI FTP site that has long annoyed me and one workaround. Warning: I threw this together minutes ago and it’s not fully tested.
Update July 7 2014: NCBI have changed things so code in this post no longer works
Read the rest…
It’s about one year since the science story dubbed #arseniclife hit the headlines. November 30th saw the release of a draft genome sequence for Halomonas sp. GFAJ-1, the bacterium behind the furore.
As Iddo pointed out on Twitter, sequencing the DNA from GFAJ-1 is itself strong evidence against arsenate in the DNA backbone, since the sequencing chemistry would be highly unlikely to work in that case. However, if like me you think that a new microbial genome provides the most fun to be had in bioinformatics [*], you’ll be excited by the availability of the data.
In this post then: where to get it, some very preliminary analysis and some things that you might like to to with it. Projects for your students, perhaps.
[*] note to self: why, then, am I working on colorectal cancer?
Read the rest…
File under: simple, but a useful reminder
UCSC Genome Bioinformatics is one of the go-to locations for genomic data. They are also kind enough to provide access to their MySQL database server:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A
However, users are given fair warning to “avoid excessive or heavy queries that may impact the server performance.” It’s not clear what constitutes excessive or heavy but if you’re in any doubt, it’s easy to create your own databases locally. It’s also easy to create only the tables that you require, as and when you need them.
As an example, here’s how you could create only the ensGene table for the latest hg19 database. Here, USER and PASSWD represent a local MySQL user and password with full privileges:
# create database
mysql -u USER -pPASSWD -e 'create database hg19'
# obtain table schema
# create table
mysql -u USER -pPASSWD hg19 < ensGene.sql
# obtain and import table data
mysqlimport -u USER -pPASS --local hg19 ensGene.txt
It’s very easy to automate this kind of process using shell scripts. All you need to know is the base URL for the data, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ and that there are two files with the same prefix per table: one for the schema (*.sql) and one with the data (*.txt.gz).