How to: create a partial UCSC genome MySQL database

File under: simple, but a useful reminder

UCSC Genome Bioinformatics is one of the go-to locations for genomic data. They are also kind enough to provide access to their MySQL database server:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A

However, users are given fair warning to “avoid excessive or heavy queries that may impact the server performance.” It’s not clear what constitutes excessive or heavy but if you’re in any doubt, it’s easy to create your own databases locally. It’s also easy to create only the tables that you require, as and when you need them.
As an example, here’s how you could create only the ensGene table for the latest hg19 database. Here, USER and PASSWD represent a local MySQL user and password with full privileges:

# create database
mysql -u USER -pPASSWD -e 'create database hg19'
# obtain table schema
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ensGene.sql
# create table
mysql -u USER -pPASSWD hg19 < ensGene.sql
# obtain and import table data
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ensGene.txt.gz
gunzip ensGene.txt.gz
mysqlimport -u USER -pPASS --local hg19 ensGene.txt

It’s very easy to automate this kind of process using shell scripts. All you need to know is the base URL for the data, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ and that there are two files with the same prefix per table: one for the schema (*.sql) and one with the data (*.txt.gz).

2 thoughts on “How to: create a partial UCSC genome MySQL database”

brent

May 24, 2011 at 03:47

thanks Neil, this is something that’s simple now that I’ve seen it and also very useful.
maximilianhaeussler

June 9, 2011 at 18:03

It’s a lot easier to download the binary tables from via their rsync server see http://genomewiki.ucsc.edu/index.php/Browser_Installation#Download_assembly_database_tables

Comments are closed.

What You're Doing Is Rather Desperate

Notes from the life of a [data] scientist

How to: create a partial UCSC genome MySQL database

2 thoughts on “How to: create a partial UCSC genome MySQL database”

Share this:

Related

2 thoughts on “How to: create a partial UCSC genome MySQL database”