In the former posts I described:
- what's happening in the storage-engine (in this case: MyISAM) by executing the query: JOIN
- what's happening in the level above the storage-engine: JOIN (2)
- after reading a record from a table this record is inspected: does it fulfill the conditions given in the WHERE-clause? Here you'll find more: JOIN (3)
testbed
For this series of posts I've created the tables TestSmall and TestBig of identical structure and put some data in it.TestSmall:
MariaDB [TestOpt]> select count(*) from TestSmall;
+----------+
| count(*) |
+----------+
| 301036 |
+----------+
1 row in set (0.04 sec)
MariaDB [TestOpt]> desc TestSmall;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| Id | char(8) | YES | | NULL | |
| PZN | char(7) | YES | MUL | NULL | |
| EVP | decimal(7,2) | YES | | NULL | |
| HAP | decimal(7,2) | YES | | NULL | |
| ArtikelBez | varchar(40) | YES | | NULL | |
| ArtikelText | varchar(26) | YES | | NULL | |
| Hersteller | char(5) | YES | | NULL | |
+-------------+--------------+------+-----+---------+-------+
7 rows in set (0.01 sec)
MariaDB [TestOpt]> show index from TestSmall;
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestSmall | 1 | PZN | 1 | PZN | A | 301036 | NULL | NULL | YES | BTREE | | |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)
MariaDB [TestOpt]>
TestBig:
MariaDB [TestOpt]> select count(*) from TestBig;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.00 sec)
MariaDB [TestOpt]> desc TestBig;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| Id | char(8) | YES | | NULL | |
| PZN | char(7) | YES | MUL | NULL | |
| EVP | decimal(7,2) | YES | | NULL | |
| HAP | decimal(7,2) | YES | | NULL | |
| ArtikelBez | varchar(40) | YES | | NULL | |
| ArtikelText | varchar(26) | YES | | NULL | |
| Hersteller | char(5) | YES | | NULL | |
+-------------+--------------+------+-----+---------+-------+
7 rows in set (0.00 sec)
MariaDB [TestOpt]> show index from TestBig;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestBig | 1 | PZN | 1 | PZN | A | 22321 | NULL | NULL | YES | BTREE | | |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)
MariaDB [TestOpt]>
observation
The investigation started with an SQL-statement with this result:MariaDB [TestOpt]> select SQL_NO_CACHE count(*) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+----------+
| count(*) |
+----------+
| 14 |
+----------+
1 row in set (21.35 sec)
MariaDB [TestOpt]> select SQL_NO_CACHE count(*) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+----------+
| count(*) |
+----------+
| 14 |
+----------+
1 row in set (0.38 sec)
MariaDB [TestOpt]>
You can see the difference in the execution-time of these two statements: 21 secs. vs. 0.4 secs. And you can also see that these two statements are identical except for the usage of the keywords JOIN and STRAIGHT_JOIN.STRAIGHT_JOIN forces the database to use the tables in this order: TestSmall -> TestBig, whereas by using JOIN in the statement the database does the ordering by itself and it chooses this order: TestBig -> TestSmall.
I verified this by the use of the counter-variables I've introduced here. And also the database tells us this by using EXPLAIN:
MariaDB [TestOpt]> explain select SQL_NO_CACHE count(*) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: B
type: ALL
possible_keys: PZN
key: NULL
key_len: NULL
ref: NULL
rows: 10000000
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: A
type: ref
possible_keys: PZN
key: PZN
key_len: 8
ref: TestOpt.B.PZN
rows: 1
Extra: Using where
2 rows in set (0.00 sec)
MariaDB [TestOpt]>
As you can see the optimizer chooses the table B (=TestBig) as the leading table and scans the full table (10 mio. records). If a match is found it switches over to A (=TestSmall) and does an access via the index using the value found in the column PZN of the record from TestBig.And here is the EXPLAIN of the second statement:
MariaDB [TestOpt]> explain select SQL_NO_CACHE count(*) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: A
type: ALL
possible_keys: PZN
key: NULL
key_len: NULL
ref: NULL
rows: 301036
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: B
type: ref
possible_keys: PZN
key: PZN
key_len: 8
ref: TestOpt.A.PZN
rows: 448
Extra: Using where
2 rows in set (0.01 sec)
MariaDB [TestOpt]>
In this case it starts with the table A (=TestSmall) and scans the full table (300K records). If a match is found it switches over to table B (=TestBig) and in it searches for the given PZN using the index.the question
Without some force to use the tables in a special order the optimizer chooses an order, which leads to a result taking much more time then the other way. But why? Why does the optimizer think it's plan for executing the query is faster?optimizer
Simplified this happens in a database-server: a user sends such an SQL-statement to the database-server. The database-server accepts the stream of characters and does the following steps:- checks the syntax
- checks the access-rights of the user who sent this
- searches for a good path to execute the query
- and finally executes the query following the path found
In this post I would like to describe what happens in the optimizer-stage of the server when it inspects this statement. This is not a general description of this optimizer nor a discussion of possible ways an optimizer can work. I will follow the route of this statement in the optimizer-stage in this server (I did this in MariaDB). And I want to come to the point where the decision for the wrong path is made and also show why this decision is made this way by the existing software.
book
To repeat myself: it would help us if we can find an introductory text about the optimizer used in MySQL/MariaDB. Unfortunately I didn't find much except this book: here and here. You will find a short introduction into this topic on page 170 of Sasha's book. The book was published in 2007 so keep in mind that the code in MySQL/MariaDB has evolved in the meantime but the book is still helpful (and this is valid not only for this topic).possible paths
In our statement there are two paths possible for executing the query: start with the table TestSmall and for all records found search for the corresponding records in TestBig Or you can start with the Table TestBig and look for all corresponding records in TestSmall. Sounds easy to find a good path because we have only 2 paths to choose from. But our SQL-statement is simple, think of a statement with 3 tables (6 possible paths) or 4 tables (24 possible paths) in it. And I've worked with statements including 30 tables (not all different) in it. The number of possible orderings of the tables can be calculated by this formula: n!, nicely the page contains some values to show how this number explodes .So it can be a bit of work to do the calculations and comparisons. And we have to keep in mind what this little drawing tells us.
costs
The server computes a value for a path. This value is called a cost but this is not what we have to pay for executing the statement. This value is simply a number; it is assumed that a higher workload for a path is equivalent to a higher cost. Let's say α and β are possible paths then we assume that this statement is true:I will soon show how costs are calculated in MariaDB.
functions
Let's look at the code doing this job. You will find most of the functions in the file sql/sql_select.cc (and sql_select.h). Here is the hierarchy:mysql_execute_command()
execute_sqlcom_select()
handle_select()
mysql_select()
JOIN::optimize()
JOIN::optimize_inner()
make_join_statistics()
choose_plan()
greedy_search() // old (or alternate) way: find_best()
best_extension_by_limited_search()
best_access_path()
matching_candidates_in_table
best_extension_by_limited_search()
best_access_path()
matching_candidates_in_table
best_access_path()
matching_candidates_in_table
best_extension_by_limited_search()
best_access_path()
matching_candidates_in_table
In our case most of the work was done in the function-tree starting with best_extension_by_limited_search(). Here the calculation of the cost of a path started with the table TestSmall. As the SQL-statement contains a restriction on the column Hersteller and we do not have an index on this column only a table-scan can be used for this part of the job and therefore the cost of reading the table this way is calculated. Then this path will be extended by adding the table TestBig to the calculations. Here we have to compare the cost of a full table-scan on TestBig with the cost of using the existing index on the column PZN for searching in TestBig for all the records with the given value (from the record in TestSmall) in the column PZN. The final value will be stored (as it's the first path this is the cost of the best path, currently). Then the computation starts with the table TestBig and calculates the cost for reading this table. Then this path will be extended by the table TestSmall and the costs of this (extended) path will be calculated.
As we now have two values we can do a comparison of these two values and find that the last value computed is lower than the first value so the last path will be stored as the better one. And as no other possible paths exist this will be the path executed.
class JOIN
We need some variables for storing intermediate and final results like costs or paths or tables involved in the query and so on. You can find these in the class JOIN defined in sql/sql_select.h. Some important variables in this class are:- table_count: number of tables in the join
- table: the tables in the join
- join: the tables in the order of the executed QEP (=query execution plan)
- best_positions: the best complete QEP so far
- best_ref: (partial) QEP, micht be reordered during the search
- best_positions: the best QEP so far
- positions: current join optimization state
- best_read: the cost of the current best QEP (DBL_MAX at start)
best plan
I would like to omit all the intermediate steps in calculating the cost of a path so let me jump right to the end of the calculations and show where the comparison/decision is made. It's in the function best_extension_by_limited_search() (in sql/sql_select.cc): if (current_read_time < join->best_read)
{
memcpy((uchar*) join->best_positions, (uchar*) join->positions,
sizeof(POSITION) * (idx + 1));
join->record_count= partial_join_cardinality;
join->best_read= current_read_time - 0.001;
}
Let's start with the if-statement. In inspecting our SQL-statement current_read_time has a value of approx. 24 mio, join->best_read a value of approx. 162 mio. (=the cost of the best plan so far). As the first value is less then the second we've found a better plan and therefore the following code-block is executed: it copies the QEP to join->best_positions and stores the corresponding cost in join->best_read.
numbers
Most of the computations are done in double-precision, but I will show only approximate values.Here is the code that computes these numbers (in sql/sql_select.cc some lines above the if-statement):
/* Compute the cost of extending the plan with 's' */
current_record_count= record_count * position->records_read;
current_read_time=read_time + position->read_time +
current_record_count / (double) TIME_FOR_COMPARE;
The old plan started with the table TestSmall and added TestBig to it. These are the computations:
current_record_count= record_count * position->records_read;
record_count: 301,036 TestSmall: number of records
position->records_read: 448 TestBig: number of records / number of different keys in index = 10 mio. / 22298
current_record_count = 134,864,128 result: 301,036 * 448
for every record in TestSmall we have to read 448 records in TestBig via the index
current_read_time=read_time + position->read_time + current_record_count / (double) TIME_FOR_COMPARE;
read_time: 66,334 current_read_time of TestSmall
position->read_time: 135,165,164 301,036 * (448 + 1)
current_record_count: 134,864,128 see above
TIME_FOR_COMPARE: 5 a constant
current_read_time= 162,204,324 result: 66,334 + 13,516,5164 + 134,864,128 / 5
So we have the cost of this path calculated to a value of approx. 162 mio.Now we check the second possibility: start with TestBig and add TestSmall to it:
current_record_count= record_count * position->records_read;
record_count: 1 TestSmall: number of records / number of different keys in index
position->records_read: 10,000,000 TestBig: number of records
current_record_count = 10,000,000 result
current_read_time=read_time + position->read_time + current_record_count / (double) TIME_FOR_COMPARE;
read_time: 2,211,613 current_read_time of TestBig
position->read_time: 20,000,000 10.000.000 * (1 + 1)
current_record_count: 10,000,000 see above
TIME_FOR_COMPARE: 5 a constant
current_read_time= 24,211,613 result: 2,211,613 + 20,000,000 + 10,000,000 / 5
So this path has an asscociated cost-valuet of approx. 24 mio.criticize the numbers
Something is wrong with these calculations. Calculating the costs resulted in numbers 162mio. and 24mio. whereas the execution results in a time-consumption of 0.4sec and 21sec., the costs differ in a factor of approx. 8 and the execution of the paths differ in a factor of 40 but in the opposite direction. The difference in the factors is not the problem, but the faster path has the higher costs (at least for the optimizer).It looks like the optimizer in MariaDB weights the cost of a table-scan much lower then using an index. Or I found a counter example without any relevance in practice.
conclusion
Finding a good cost-function is not an easy task. I've taken one query as an example and found that in this case the optimizer choose the wrong path (by a factor). I don't want to generalize this example, we must be content to find a good compromise. But the optimizer is a central point in the database-software so we should focus our interest on this topic.MySQL
I did the same tests with MySQL (version 5.5.8) and got similar results: the statement using the JOIN-keyword was much slower then the statement using the STRAIGHT_JOIN-keyword. And the EXPLAINs looked identical to the lists I posted above. But the MySQL-server was a bit faster then the MariaDB-server, which could have been caused by different compile-options (didn't check this).I didn't dive into the code of the optimizer in MySQL but the code looks similar (but not identical) to the code in MariaDB. The computations of the values representing costs as shown above look a bit different in MySQL. Here is the line where the decision to change the order of the tables is made:
if ((search_depth == 1) || (current_read_time < join->best_read))
with these values:
search_depth: 61
current_read_time: 12,211,613
join->best_read: 161,843,081
As you can see: the values differ but the relation is the same as above.correctness
This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.update
ups, an error happened: as a thousands separator I used a dot, e.g. the number 301036 was written as 301.036. But it's also possible to write the same number in this form: 301,036 using a comma as the separator.The usage of a character as a separator differs from country to country. As this blog is written in English I had to use the separator-character of this language for the numbers. By mistake I used the conventions used in Germany (the dot as a thousands-separator). I corrected this to use a comma.