View and Union very slow

December 11th, 2008 robin Posted in MySQL Comments Off on View and Union very slow

I’ve found quite a few articles like this one from MySQL selling views as the perfect way to join archive data with live data to provide seamless access.

What I wanted: keep archive data in a compact, non modifiable table (I chose packed MyISAM tables), and live data in a transactional table (I chose InnoDB), possibly even on two different machines.  I was thrilled when I found the article linked above, with the sweet idea how to create a view which would combine both, and imagined that while it would make a performance hit, it shouldn’t be that bad, because the data in the packed, optimised MyISAM tables should have much faster access than the InnoDB, and given that the volume of the data is in there, and all select conditions are on indexed columns, it should be pretty fast.

WRONG!

Views using a UNION are very poorly optimised, and create complete temporary tables before matching any where conditions.  Even on a COUNT(*) which should not even require /any/ data to be retreived, it took significantly longer.

Here by example:

Get the sample employees database here, and load it.

Create a test database, and a salaries table in it:

(root@127.0.0.1) [employees]>CREATE DATABASE `test`;
(root@127.0.0.1) [employees]>CREATE TABLE `test`.`salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Then split the table salaries:

(root@127.0.0.1) [employees]>INSERT INTO `test`.`salaries` SELECT * FROM `employees`.`salaries` WHERE emp_no<242293;
(root@127.0.0.1) [employees]>DELETE FROM `employees`.`salaries` WHERE emp_no<242293;

How long does it take to count each one?

(root@127.0.0.1) [employees]>SELECT SQL_NO_CACHE COUNT(*) FROM `employees`.`salaries`;
+----------+
| COUNT(*) |
+----------+
|  1492975 |
+----------+
1 row in set (0.30 sec)
(root@127.0.0.1) [employees]>SELECT SQL_NO_CACHE COUNT(*) FROM `test`.`salaries`;
+----------+
| COUNT(*) |
+----------+
|  1351072 |
+----------+
1 row in set (0.29 sec)

That’s ok, and we see from the explain that it is indeed using a SIMPLE select, using the index on epm_no:

(root@127.0.0.1) [employees]>EXPLAIN SELECT SQL_NO_CACHE COUNT(*) FROM `test`.`salaries`;
+----+-------------+----------+-------+---------------+--------+---------+------+---------+-------------+
| id | select_type | table    | type  | possible_keys | key    | key_len | ref  | rows    | Extra       |
+----+-------------+----------+-------+---------------+--------+---------+------+---------+-------------+
|  1 | SIMPLE      | salaries | index | NULL          | emp_no | 4       | NULL | 1351439 | Using index |
+----+-------------+----------+-------+---------------+--------+---------+------+---------+-------------+
1 row in set (0.01 sec)

So let’s make a view and join the tables, and try that again on the view:

(root@127.0.0.1) [employees]>CREATE VIEW salaries_all AS SELECT * FROM `employees`.`salaries` UNION ALL SELECT * FROM `test`.`salaries`;
(root@127.0.0.1) [employees]>EXPLAIN SELECT SQL_NO_CACHE COUNT(*) FROM `employees`.`salaries_all`;
+----+--------------+------------+------+---------------+------+---------+------+---------+------------------------------+
| id | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra                        |
+----+--------------+------------+------+---------------+------+---------+------+---------+------------------------------+
|  1 | PRIMARY      | NULL       | NULL | NULL          | NULL | NULL    | NULL |    NULL | Select tables optimized away |
|  2 | DERIVED      | salaries   | ALL  | NULL          | NULL | NULL    | NULL | 2713588 |                              |
|  3 | UNION        | salaries   | ALL  | NULL          | NULL | NULL    | NULL | 1351439 |                              |
| NULL | UNION RESULT | <union2,3> | ALL  | NULL          | NULL | NULL    | NULL |    NULL |                              |
+----+--------------+------------+------+---------------+------+---------+------+---------+------------------------------+
4 rows in set (9.85 sec)

Oh my god!

Even a query with sub-queries is faster by a factor of 20!

(root@127.0.0.1) [employees]>SELECT SQL_NO_CACHE ( SELECT COUNT(*) FROM `test`.`salaries` ) + ( SELECT COUNT(*) FROM `employees`.`salaries` );
+----------------------------------------------------------------------------------------------+
| ( SELECT COUNT(*) FROM `test`.`salaries` ) + ( SELECT COUNT(*) FROM `employees`.`salaries` ) |
+----------------------------------------------------------------------------------------------+
|                                                                                      2844047 |
+----------------------------------------------------------------------------------------------+
1 row in set (0.56 sec)
(root@127.0.0.1) [employees]>EXPLAIN SELECT SQL_NO_CACHE ( SELECT COUNT(*) FROM `test`.`salaries` ) + ( SELECT COUNT(*) FROM `employees`.`salaries` );
+----+-------------+----------+-------+---------------+--------+---------+------+---------+----------------+
| id | select_type | table    | type  | possible_keys | key    | key_len | ref  | rows    | Extra          |
+----+-------------+----------+-------+---------------+--------+---------+------+---------+----------------+
|  1 | PRIMARY     | NULL     | NULL  | NULL          | NULL   | NULL    | NULL |    NULL | No tables used |
|  3 | SUBQUERY    | salaries | index | NULL          | emp_no | 4       | NULL | 2713588 | Using index    |
|  2 | SUBQUERY    | salaries | index | NULL          | emp_no | 4       | NULL | 1351439 | Using index    |
+----+-------------+----------+-------+---------------+--------+---------+------+---------+----------------+
3 rows in set (0.00 sec)

I guess for now I’ll have to wait till views are optimised before I can use them like this…