Last week I got to see Jay Pipes speak at a local PHP users group meeting. He did a 2-part talk on optimizing SQL (Join-Fu, as he calls it). I managed to keep up and follow part 1, but I have to admit that part 2 was largely over my head. Below are some of the key things I took away from Jay’s great presentation.
Vertical Partitioning
Partitioning vertically is a database optimizing technique where tables with many columns are split into multiple, smaller tables. Without partitioning, even normalized tables may be a mix of frequently and infrequently accessed columns. This makes it difficult to keep things loaded in memory and can affect overall performance.
By splitting the table in two, you can have a table of frequently accessed columns and a table of less frequently accessed columns. The tables can be merged together dynamically when needed, but most of the time you will just need the table with the frequently accessed data. This approach also gives you the option of using different table types (storage engines) for each of the tables. Of course, it’s probably easiest to implement vertical partitioning at the beginning of your project instead of coming back and making changes to the database structure later.
Horizontal Partitioning
Partitioning horizontally is a similar concept except that the table is split by rows rather than columns. This works well when certain rows are accessed more frequently than others. Depending on the size of the database, having a table of more frequently accessed rows can really speed things up.
The events registration system I created at MPOW can potentially benefit from this. We have over 10,000 registrations in the system, but most of those are for past events that we rarely need to access. Having a table of 100-200 registrations for upcoming events and a separate table for the thousands of past event registrations would probably provide a noticeable performance increase.
Query Cache
MySQL has a query cache that it uses to increase performance. If a particular query has already been run on the database, MySQL can return the cached result instead of having to parse it, optimize it, etc. I was already familiar with the query cache and we use it MPOW. However, I learned a couple of new tidbits from Jay:
- “Any modification to any table involved in the SELECT invalidates the stored result.” I guess I already knew this. What’s cool, however, is that partitioning your tables properly can reduce this. Making a change to a table containing upcoming event registrations would not affect the cache for the table containing past event registrations.
- Don’t use InnoDB tables when there are more than 5000 rows and you’re doing full table scans. It’s slooooow. Use MyISAM table types instead.
- If you are doing a SELECT on tables that change frequently, there is no point in storing the result in cache because it will soon be invalid. Use SELECT SQL_NO_CACHE to prevent MySQL from storing highly dynamic results and filling up the cache.
- MEMORY table types are fast and good for data that can be thrown away or re-computed again.
ANSI vs. Theta Coding Style
Jay put two different versions of a SELECT statement up on the screen and asked the group which style they used. One style used INNER JOINs (ANSI style) and the other used WHERE and AND clauses (Theta style or comma style). I think most of the room (including me) raised our hands for the latter. Jay made some good points about ANSI being easier to read and more portable. I need to force myself to try ANSI and get used to it. Theta style, however, is very natural for me and I suspect it will be hard for me to change.
The slides from Jay’s talk (including part 2) are up on his website.


Post a Comment