The three phases of MySQL usage

As Mike keeps reminding me, getting your data into database tables is A Good Thing. Like many people, my database of choice is MySQL – largely because it was the first one that I tried and it works for me.

However, I’m far from being an expert MySQL user. In fact, I’ve identified 3 stages in my use of MySQL over the years; see if you recognise yourself in any of them.

  1. The single table database
  2. Early in your MySQL career, you discover either “LOAD DATA INFILE” or the command-line tool “mysqlimport”. You realise that this is a quick and easy way to get all of those delimited text files straight into a database. Off you go, merrily importing everything on your hard drive into tables.
    At some stage, you begin to wonder whether listing the same identifier (gene name, PDB ID, species…) many times in multiple columns is sensible. This eureka moment leads you to…

  3. …Discovery of the join
  4. At some stage, you get your head around joining tables. At first, you can’t quite see the point – isn’t it easier to devise a query on a single table? Maybe so, but it’s inefficient and severely limits the combinations of data that your query can return. It’s a slow realisation but pretty soon, you’re chopping up your data into multiple tables and joining all over the place. Including examples where you probably don’t need to do so.

  5. The optimised database
  6. You begin to wonder why your MySQL queries are so slow. Sure, you have a table with 20 million rows, but your computer does everything else within seconds, so what’s up with MySQL?
    You head to the MySQL documentation website, try “optimization” as a query and discover a whole new world. OK, the various “my.ini” example files made little or no difference, but that 200x speed increase when you indexed multiple keys was definitely worth it.

Just starting out in MySQL? My advice: head over to Chapter 7: Optimization right now, skim through and if it doesn’t make complete sense yet, bookmark it for later.

4 thoughts on “The three phases of MySQL usage

  1. Chris Lasher

    Datbases are really a neat way to handle data and require a very different train of thought (based off of set theory) than I’m used to, and I’m beginning to see what problems they excel at solving, and which they don’t.

    I suppose these lessons here can be MySQL specific, but before we ever wrote our first query in our Data Management in Bioinformatics course, we learned about Entity/Relation theory, and database normalization. I really do not have this completely mastered yet, but I believe Prof. Ramakrishnan when he tells us that by the time you have your tables in place, it’s too late. Good databases come from thoughtful database design. (Note that’s not the same as perfect databases.) Minimizing data redundancy and preservation of dependencies should be key goals to keep in mind.

    We’ve been trying to put our class notes and efforts into a wikibook, which may or may not be of interest (it’s in very raw form right now): http://en.wikibooks.org/wiki/Data_Management_in_Bioinformatics

  2. Mike

    Hi Neil,

    I like the new look for your blog, I like this post too. Have you tried activerecord, or datamapper? I may sound like a broken record but they make database life much, much easier. If you’re using ruby of course.

    I’m not an expert, but I’ve heard that limiting the number of joins in your database can be good thing for very, very large datasets. I think that Google’s BigTable is (unsurprisingly) a database with one big table where everything is completely denormalised and thus making it much faster to parse. I’m still waiting for the activerecord plugin for this though…

  3. Euan

    Heh, yep, looks familiar!

    It’s easy to get carried away with step 3 and start to normalize obsessively and over-optimize… maybe there’s a step 4. I’m not sure what it is.

  4. agbiotec

    hehe… sure thing about the 3 steps Neil… and then, if you spend 5 years doing a whole thesis in SQL, you end up reading stuff like this http://tinyurl.com/6r3s67 (titled “The Art of SQL”) — and I am talking from very personal experience :-)

Comments are closed.