Monday 3 August 2015

optimizer: indexes (2)

some time ago I started to describe what's happening in the code of (mostly) MariaDB when it handles a SQL-statement. The SQL-statement in this series of posts is always the same, the structure of the tables involved and the data stored in the tables are identical, I've changed only a few parts around it and looked how the code reactss on these modifications. With this post I would like to continue this.

This text here belongs to the series that describes the handling of multiple indexes on one table. These are the older posts:
  • this series started here: indexes
  • one minor piece omitted in the above post was described here: 0xA8
  • another tiny piece omitted but described in detail here: 0xCA
In this post I would like to continue this description.


the environment

I want to describe the environment I'm using (again).

For this series of posts I'm looking at how the optimizer in the code of MariaDB1) inspects a SQL-statement and finds a good plan for collecting the data requested in the SQL-statement. In this post I look at how the optimizer treated indexes on tables so I created these indexes:
MariaDB [TestOpt]> show index from TestSmall;
Empty set (0.00 sec)

MariaDB [TestOpt]> show index from TestBig;  
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestBig |          1 | EVP        |            1 | EVP         | A         |       11350 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind1       |            1 | PZN         | A         |       22321 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind1       |            2 | ArtikelText | A         |     1666666 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind1       |            3 | Hersteller  | A         |    10000000 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | HAP        |            1 | HAP         | A         |       10162 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind2       |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind2       |            2 | ArtikelText | A         |    10000000 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind2       |            3 | PZN         | A         |    10000000 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | PZN        |            1 | PZN         | A         |       22321 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Hersteller |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | PznHerst   |            1 | PZN         | A         |       22321 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | PznHerst   |            2 | Hersteller  | A         |     1111111 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | HerstPzn   |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | HerstPzn   |            2 | PZN         | A         |     1111111 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
14 rows in set (0.00 sec)

MariaDB [TestOpt]> 
As you can see, the table TestSmall contains no indexes at all, the table TestBig contains 8 indexes. Please do not try to see any meaning in these indexes, they are intended for the demonstration here.


the statement

This is the statement I want to follow it's way through the code of MariaDB:
MariaDB [(none)]> use TestOpt;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [TestOpt]> select SQL_NO_CACHE  count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0.45 sec)

MariaDB [TestOpt]> explain select SQL_NO_CACHE  count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 301036
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: Ind1,Ind2,PZN,Hersteller,PznHerst,HerstPzn
          key: PznHerst
      key_len: 14
          ref: TestOpt.A.PZN,const
         rows: 9
        Extra: Using where; Using index
2 rows in set (0.00 sec)

MariaDB [TestOpt]> 
I've included the explain of the statement.

You can see the optimizer chooses to start with the table TestSmall doing a full table-scan (no index available) and reading 300K records. After reading one record from TestSmall it applies the WHERE-clause to this record and when a match is found it switches over to the table TestBig and searches for records matching the value for Hersteller (from the WHERE-clause) and PZN (from the record just read from TestSmall) using the index PznHerst.


warning

As in the last posts I did all my tests with MariaDB, the code-examples presented here are taken from the source-code of MariaDB 10.0.10, but the explanation should also be valid for other versions and for MySQL2).

The example uses MyISAM as the storage engine for the tables and indexes, some (but not all) of the statements here are only valid for this engine.

Please do not generalize the results presented here. A lot of the data presented here depends on the data in the tables, and this is my data. If you do your own tests don't expect to see the same numbers or plans.


prerequisites

In this post I ended with the cost-values for accessing a table alone. Let me repeat the results:
TestSmall:
found_records  =    301,036
read_time      =      6,127
s->worst_seeks =     18,382
TestBig:
found_records  = 10,000,000
read_time      =    211,613
s->worst_seeks =    634,840
As there are indexes defined on the table TestBig the number of records to inspect when this table is accessed via an index are computed:
table TestBig:
index 3 = Ind2       = 204,922 records
index 5 = Hersteller = 455,132 records
index 7 = HerstPzn   = 380,971 records
Please keep in mind that these are estimated values.


choosing a plan

A lot of words until here, so I want to come to the topic of this post: how does the code make the decision for the best plan, the plan described in the explain above? Which alternatives are inspected? What are the arguments for this plan? Why are alternatives rejected?

Here is the function hierarchy of the handling of this statement during inspection and decision-making:
make_join_statistics()
    choose_plan()
        greedy_search()
            best_extension_by_limited_search()
                best_access_path()        --> (1)     will explain this some lines later
                --> (2)
                best_extension_by_limited_search()
                    best_access_path()    --> (3)
                --> (4)
                best_access_path()        --> (5)
                --> (6)
                best_extension_by_limited_search()
                    best_access_path()     --> (7)
                --> (8)
warning: this hierarchy is valid for the SQL-statement given above and for the environment I am working in. If you do your own tests this hierarchy may look differently. And in the hierarchy I've added some hints to steps which I would like to explain in some details later.

Let's start with an overview of the steps involved in the inspection of this statement:
The strategy is to start with a table and calculate the cost of accessing this table alone. This plan is extended by adding the second table to it and calculating the cost of accessing the records of the second table, finally the cost of accessing these 2 tables is calculated. And as no more table exist3) this calculation is finished.
Now the code tries the opposite way: the inspection begins with the second table and calculates the costs of accessing this table alone. Then this plan is extended by adding the first table and calculate the cost of accessing this table is calculated, finally the costs are combined.

As we now have two values a comparison is possible and the plan with the lower cost-value is chosen, the other plan and its values are forgotten. As no other possibilities exists the computation stops and we have a plan to execute. So it starts executing the plan computed.

Let's look a bit deeper into the code and at first I want to show the places where I will show results of computations soon.
The hints above marked as (1), (3), (5) and (7) refer to these lines of code in the function best_access_path():
  /* Update the cost information for the current partial plan */
  pos->records_read= records;
  pos->read_time=    best;
  pos->key=          best_key;
  pos->table=        s;
  pos->ref_depend_map= best_ref_depends_map;
  pos->loosescan_picker.loosescan_key= MAX_KEY;
  pos->use_join_buffer= best_uses_jbuf;
 

the other hints refer to the function best_extension_by_limited_search() and especially these lines of code:
      /* Find the best access method from 's' to the current partial plan */
      POSITION loose_scan_pos;
      best_access_path(join, s, remaining_tables, idx, disable_jbuf,
                       record_count, join->positions + idx, &loose_scan_pos);

      /* Compute the cost of extending the plan with 's' */

      current_record_count= record_count * position->records_read;
      current_read_time=read_time + position->read_time +
                        current_record_count / (double) TIME_FOR_COMPARE;

So let's dive into the code and look what the results of the computations are.

Everything starts from scratch with an empty plan, so the first table inspected is the table TestBig.
Stop! Why do we start with the table TestBig? If you look at the SQL-statement this table is found in the second place and its the table with the most records in it. So why?

If you look into the code of choose_plan() you see this call near the beginning:
  my_qsort2(join->best_ref + join->const_tables,
            join->table_count - join->const_tables, sizeof(JOIN_TAB*),
            jtab_sort_func, (void*)join->emb_sjm_nest);
In this function the entries are sorted using the comparison-function join_tab_cmp() (via the var. jtab_sort_func). And in this function the number of records are compared:
  if (jt1->found_records > jt2->found_records)
    return 1;
and as the values are 301,036 (for jt1 aka. TestSmall) and 204,922 (for jt2 aka. TestBig) it returns a value of 1 which will result in swapping the entries, so the JOIN_TAB representing the table TestBig is in the first position of the list and therefore the code starts the inspection with this table.

And for the table TestBig it calculates the costs, so here is the result of the step marked with (1):
(1) TestBig: 
records_read: 204,922
read_time:    19,858
key:          3 = Ind2

The whole cost for accessing this table alone is computed as:
(2)
current_record_count= 204,922
current_read_time   = 60,843

Let's add the next table to this partial plan. So we inspect the table TestSmall and found this path:
(3) TestSmall:
records_read: 301,036
read_time:    128,679
key;          NULL

Now we put all the numbers together and calculate the cost of accessing both tables by the ways found so far:
(4) TestBig + TestSmall:
current_record_count= 61,688,899,192 = 204,922 * 301,036
current_read_time   = 12,337,969,360 =  60,843 + 128,679 + 61,688,899,192/5

If you look at the numbers you will see that the values involved in the computation are a combination of the numbers taken from step (2) and (3).

Here the inspection of this plan ends and we store it:
        if (current_read_time < join->best_read)  
        {
          memcpy((uchar*) join->best_positions, (uchar*) join->positions, 
                 sizeof(POSITION) * (idx + 1));
          join->record_count= partial_join_cardinality;
          join->best_read= current_read_time - 0.001;
        }

The best plan so far has this (initial) value: join->best_read = DBL_MAX = 1.7976931348623157e+308
so our plan just computed has a better value and is therefore copied to join->best_positions.

So we go back to the beginning but in this case we start with the next table in the list: TestSmall. Here are the results:
(5) TestSmall
records_read: 301036
read_time:    6127
key:          NULL

The whole cost for accessing this table alone is computed as:
(6)
current_record_count= 301,036
current_read_time   = 66,334

We extend this plan by adding the table TestBig. For accessing this table these costs are computed:
(7) TestBig:
records_read: 9
read_time:    391,573
key:          6  = PznHerst

From these numbers we calculate the cost of the total plan:
(8)
current_record_count= 2,709,324 = 301,036 * 9
current_read_time   = 999,772  = 66,334 + 391,573 + 2,709,324/5

As no more table is in our list the inspection stops here and we can compare the cost of this plan with the best plan so far:
        if (current_read_time < join->best_read)  
        {
          memcpy((uchar*) join->best_positions, (uchar*) join->positions,   
                 sizeof(POSITION) * (idx + 1));
          join->record_count= partial_join_cardinality;
          join->best_read= current_read_time - 0.001;
        }

and as 999,772 is less than 12,337,969,360 the code copies the plan to the final position.

We would go back to the beginning but as there are no more tables on the list the computation stops here and we finally have a plan to execute.

That's all. The best plan is found and in the next step it will be executed.


correctness

This text is a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.


some notes:
1) in this post I'm using version 10.0.10
2) I didn't check this, sorry
3) valid for this test-case = for this statement