an unsystematic journey through the source-code of MySQL and MariaDB: May 2015

some time ago I started posting articles about JOINs and the handling of such a SQL-statement in MySQL and MariaDB. I want to stay with this question, so I will use the same tables with the same data in them as before, and also the same statement, I only added some indexes. The indexes created here are only used for this demonstration, they do not have any other meaning. And in this post I want to show how the optimizer handles this collection of indexes and chooses one of them.

For this example I did all my tests with MariaDB, the code-examples presented here are taken from the source-code of MariaDB, but the explanations should also be valid für MySQL.

Be careful when you generalize the results presented here. A lot of the data presented here depends on the data in the tables, and this is my data. If you do your own tests don't expect to see the same numbers (like values for keys, file-offsets, distribution of keys etc.).

Let me start with a short explanation of my environment.

environment

These are the tables I will use for this post (same as before):

MariaDB [TestOpt]> desc TestSmall;
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| Id          | char(8)      | YES  |     | NULL    |       |
| PZN         | char(7)      | YES  |     | NULL    |       |
| EVP         | decimal(7,2) | YES  |     | NULL    |       |
| HAP         | decimal(7,2) | YES  |     | NULL    |       |
| ArtikelBez  | varchar(40)  | YES  |     | NULL    |       |
| ArtikelText | varchar(26)  | YES  |     | NULL    |       |
| Hersteller  | char(5)      | YES  |     | NULL    |       |
+-------------+--------------+------+-----+---------+-------+
7 rows in set (0.02 sec)

MariaDB [TestOpt]> desc TestBig;  
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| Id          | char(8)      | YES  |     | NULL    |       |
| PZN         | char(7)      | YES  | MUL | NULL    |       |
| EVP         | decimal(7,2) | YES  | MUL | NULL    |       |
| HAP         | decimal(7,2) | YES  | MUL | NULL    |       |
| ArtikelBez  | varchar(40)  | YES  |     | NULL    |       |
| ArtikelText | varchar(26)  | YES  |     | NULL    |       |
| Hersteller  | char(5)      | YES  | MUL | NULL    |       |
+-------------+--------------+------+-----+---------+-------+
7 rows in set (0.01 sec)

MariaDB [TestOpt]>

The table TetSmall contains approx. 300,000 records in it, the table TestBig 10mio.

On the table TestSmall I didn't create any index, but on the table TestBig I created these:

MariaDB [TestOpt]> show index from TestSmall;
Empty set (0.00 sec)

MariaDB [TestOpt]> show index from TestBig;  
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestBig |          1 | EVP        |            1 | EVP         | A         |       11350 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind1       |            1 | PZN         | A         |       22321 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind1       |            2 | ArtikelText | A         |     1666666 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind1       |            3 | Hersteller  | A         |    10000000 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | HAP        |            1 | HAP         | A         |       10162 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind2       |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind2       |            2 | ArtikelText | A         |    10000000 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Ind2       |            3 | PZN         | A         |    10000000 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | PZN        |            1 | PZN         | A         |       22321 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Hersteller |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | PznHerst   |            1 | PZN         | A         |       22321 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | PznHerst   |            2 | Hersteller  | A         |     1111111 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | HerstPzn   |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | HerstPzn   |            2 | PZN         | A         |     1111111 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
14 rows in set (0.00 sec)

MariaDB [TestOpt]>

So you see a total of 8 indexes on TestBig, some of them are composite indexes which consists of 2 or 3 columns.

And this is the test-statement I want to use in this post:

MariaDB [TestOpt]> select SQL_NO_CACHE  count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0.38 sec)

MariaDB [TestOpt]> explain select SQL_NO_CACHE  count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G       
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 301036
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: Ind1,Ind2,PZN,Hersteller,PznHerst,HerstPzn
          key: PznHerst
      key_len: 14
          ref: TestOpt.A.PZN,const
         rows: 9
        Extra: Using where; Using index
2 rows in set (0.00 sec)

MariaDB [TestOpt]>

I already included the explain for this statement.

The EXPLAIN tells us that the server by executing this query starts with the table TestSmall and it does a full table-scan because it doesn't have any alternative for reading this table. The full 300K records are read.
For the table TestBig it doesn't do a table-scan but an index-scan instead. It has 6 indexes to choose from (the indexes EVP and HAP contain columns not usable in the query) and it chooses the index PznHerst.

Let me now do another test to verify this.

counters

In an older post I introduced some counter-variables to count the number of times a function is called at runtime. Using this approach I get the following result in /var/log/mysql/error.log after the SQL-statement is executed:

table = <TestSmall>    counter table-scan: 301037    index-scan: 0
table = <TestBig>      counter table-scan: 0         index-scan: 74

This verifies the explanation: table Testsmall is read via a table-scan, table TestBig is read via the index.

Let's use the debugger and set breakpoints. As the storage-engine used for these tests is MyIsam I will set the breakpoints in the file ha_myisam.cc in these functions:

ha_myisam::index_read_map(): sets the index and the key to search for, fetches the first record for this key
ha_myisam::index_next_same(): fetch the next record to the given index and key
ha_myisam::rnd_next: read one record in a table-scan

At runtime one can see: the server calls ha_myisam::rnd_next reading record after record from TestSmall. When a record is found which fulfills the WHERE-condition¹⁾ the server switches over to TestBig and reads the matching records from this table via the index²⁾. Whenn all records for this combination of PZN and Hersteller are read from TestBig it switches back to TestSmall and continues by calling ha_myisam::rnd_next again.

When you do this test for yourself in the debugger you will soon see that stepping through all records of TestSmall is boring so I soon removed the breakpoint in ha_myisam::rnd_next().

Inspecting the parameters of the call of ha_myisam::index_read_map() I found that the parameter index contains the value 6 meaning that the index with the name PznHerst will be searched. The parameter key contains the value to search for. Please keep in mind that this parameter starts with a flag, so your debugger might not show the value if the flag is zero (or so). key contains the values for PZN and Hersteller, and both values start with such a flag. In our case the value for PZN is taken from the current record of the table TestSmall and appended to it key contains the constant '36367' given in the SQL-statement.

That's the way the statement is handled.

Let's look into the code.

structs

The code uses some structs for storing data describing the indexes. Frequently used structs are:

KEYUSE (defined in sql_select.h)
KEY_FIELD (defined in sql_select.cc)
KEY (defined in structs.h)
KEY_PART (defined in opt_range.h)

and I've given the filename where you will find the definition. I don't want to describe these structs here, so please look into the files for the details.

There are more structs used in the code but these are the most relevant ones for the handling of indexes.

code

You will find most of the code inspected here in sql/sql_select.cc. Let's omit some steps and jump directly to the function make_join_statistics().

The first step in this function is to allocate the memory needed for the structures:

  stat=(JOIN_TAB*) join->thd->calloc(sizeof(JOIN_TAB)*(table_count));
  stat_ref=(JOIN_TAB**) join->thd->alloc(sizeof(JOIN_TAB*)*
                                         (MAX_TABLES + table_count + 1));
  .......

After this step these structures are filled with initial values:

  for (s= stat, i= 0; (tables= ti++); s++, i++)
  {
     .....
  }

The next step is to call the function update_ref_and_keys(). In this function memory is allocated for 25 entries of type KEY_FIELD³⁾ and the vars. field and end point to this allocated area. The function add_key_fields() is called for the inspection of the WHERE-condition and the ON-condition, walking through the condition-tree recursively. If the currently inspected object cond is not of the type COND_ITEM and FUNC_ITEM then the function add_key_field() is called which stores the relevant information in the allocated area (and increases the pointer end). Here this function add_key_field() is called 4 times for:

table A (=TestSmall) and column Hersteller
table B (=TestBig) and column Hersteller
table A (=TestSmall) and column PZN
table B (=TestBig) and column PZN

Back in update_ref_and_keys() this information is inspected further:

  /* fill keyuse with found key parts */
  for ( ; field != end ; field++)
  {
    if (add_key_part(keyuse,field))  
      return TRUE;
  }

With the shown function-call the member-variable keyuse (found in the class JOIN) of type DYNAMIC_ARRAY⁴⁾ will be filled with the information regarding fields that can be used for accessing the rows. The function add_key_part() is called 4 times with field pointing to the information just added one step before. In the case of handling table TestSmall the function add_key_part() is left immediately because no indexes exist for this table. In the case of handling the table TestBig it iterates:

over all index-definitions for this table
for each index over all columns defining this index

If an index-part is identical to key_field->field⁵⁾ the code adds it to keyuse_array⁶⁾ by calling add_keyuse(). After this step the function update_ref_and_keys() is left.

So here is a list of the information contained in the array keyuse:

table	index	column
TestBig	Ind1	Hersteller
TestBig	Ind2	Hersteller
TestBig	Hersteller	Hersteller
TestBig	PznHerst	Hersteller
TestBig	HerstPzn	Hersteller
TestBig	Ind1	PZN
TestBig	Ind2	PZN
TestBig	PZN	PZN
TestBig	PznHerst	PZN
TestBig	HerstPzn	PZN

As you can see it contains all parts of an index with (maybe) usable information for handling this query.

Back in make_join_statistics() the function sort_and_filter_keyuse() is called . This function sorts the entries in the array keyuse using the comparison-function sort_keyuse (in sql_select.cc). After sorting the contents of the array keyuse looks like this:

table	index	column
TestBig	Ind1	PZN
TestBig	Ind1	Hersteller
TestBig	Ind2	Hersteller
TestBig	Ind2	PZN
TestBig	PZN	PZN
TestBig	Hersteller	Hersteller
TestBig	PznHerst	PZN
TestBig	PznHerst	Hersteller
TestBig	HerstPzn	Hersteller
TestBig	HerstPzn	PZN

so the sort is done by table, index, column-number.

To simplify the handling in the next steps a NULL-entry is added to the end of the array, so keyuse now contains 11 entries.

Next the elements of keyuse are inspected further. By iterating over all entries in the array (except the recently inserted one, the NULL-struct) each element is inspected and copied to a new position in the samy array. The inspection compares the column-number of the current entry (var. use) with the column-number of the last entry (var. prev)⁷⁾:

          if ((prev->keypart+1 < use->keypart && skip_unprefixed_keyparts) ||
              (prev->keypart == use->keypart && found_eq_constant))
            continue;    /* remove */

In my case the entries 1 and 3⁸⁾ were not copied. Let's look at entry 1: it's part of the index Ind1 and this index consists of these 3 columns PZN, ArtikelText and Hersteller. In the list shown above you will find only the the columns PZN and Hersteller of the index Ind1, the column Artikeltext is missing, so there is a gap (it's hard to find an entry from the index if you have only values for the first and the last column). During the handling of this index I looked at the code-lines above in the debugger and I found the value prev->keypart was 0 and the value of use->keypart was 2 (and skip_unprefixed_keyparts is true) so the if() results in true and the continue-statement is executed which results in skipping (= not copying) the currently inspected entry. This argument can also be applied to the index Ind2 so the 3rd entry in the list is also removed from this list⁹⁾.

The array keyuse still contains 11 entries but in a modified order. In the end of the function the NULL-struct is copied to position 8¹⁰⁾ and the number of elements in the array is set to 8¹¹⁾.

Next in the function make_join_statistics() the code inspects the tables and checks for useful indexes. First the table TestSmall is inspected and as no indexes on this table exist this table is skipped and the code continues with the table TestBig. For this table the struct JOIN_TAB->keyue is already filled with data describing the indexes. The code iterates over all the columns the indexes are constructed of and sets some bitmaps. In the end only the map JOIN_TAB->const_keys is used and this bitmap will contain the value 0xA8¹²⁾.

In the next step the cost of walking through each table in a table-scan is computed. This will be done in the loop starting with:

  /* Calc how many (possible) matched records in each table */
  // for every table in the stmt do the calculation
  for (s=stat ; s < stat_end ; s++)
  {
     ........
  }

For each JOIN_TAB the member-variables found_records, read_time and worst_seeks are computed¹³⁾. These represent the costs of a table-scan.

As there are indexes defined on the table TestBig the cost of accessing the data via different index-scans are computed. Here is the function-hierarchy:

make_join_statistics()
    get_quick_record_count()
        SQL_SELECT::test_quick_select()
            find_shortest_key()
                handler::records()
                handler::keyread_time()
            get_key_scans_params()
                // for each possible key thesse functions are called: 
                check_quick_select()
                    ha_myisam::multi_range_read_info_const()
                        DsMrr_impl::dsmrr_info_const()
                            handler::multi_range_read_info_const()
                                 ha_myisam::records_in_range()
                                     mi_records_in_range()
                                         _mi_record_pos()
                                         .........

In the function SQL_SELECT::test_quick_select() a variable param of type PARAM is created and filled with values. Space is allocated for 14 entries¹⁴⁾ of type KEY_PARTs and param.key_parts points to this area. Next the code iterates over the 8 indexes of this table. The bitmap keys_to_use¹⁵⁾ is used for detecting those indexes to be put into param. As it contains the value 0xA8 (= 010101000b) only the indexes 3, 5 and 7 are inspected, the others are omitted. For these indexes memory is allocated for storing 14 elements of the struct KEY_PART. For the 3 indexes given the struct is filled with values and param.key_parts points to the first entry. But this is only preparatory work.

After the cost of a table-scan has been computed the cost of an index-scan is computed and for doing this it calls find_shortest_key(). In this function it iterates over alle indexes (again) using the map usable_keys¹⁶⁾ which in this case has the value 0xCA (= 11001010b) so the indexes Ind1, Ind2, PznHerst and HerstPzn are inspected. And the winner is the index PznHerst.

Then the function get_key_scans_params() is called. Here it iterates over the three entries just added to the var. param. The function check_quick_select() is called for each of the the indexes 3 (= Ind2), 5 (= Hersteller) and 7 (= HerstPzn). The function ha_myisam::multi_range_read_info_const() calculates the estimated number of records to inspect when the index given in the parameter keyno is used in an index-scan using the algorithm I've already described. These values are stored in TABLE->quick_rows[.]¹⁷⁾.

This ends here the inspection of the standalone tables.
The next steps involve the combination of the tables to find a good way to execute the query. This way is similar to what I've already described so I want to stop here.

correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.

some notes:

1) for this record the column Hersteller contains a value '00020'

2) you will find a more detailed description of this process here

3) the size of KEY_FIELD is 48 bytes, so 25*48%nbsp;= 1200 bytes are allocated.

4) you will find the definition of DYNAMIC_ARRAY in include/my_sys.h

5) key_field is a parameter of type KEY_FIELD. In the calling function the variiable is named field

6) keyuse_array reaches this function as a parameter from update_ref_and_keys(), there it is named keyuse

7) you will find these lines in the function sort_and_filter_keyuse() inside the for-loop

8) counting is zero-based

9) it's not removed, it will only be overwritten in a future step

10) so its the 9th entry in the array

11) again counting is zero-based. The NULL-struct is now the 9th element in the array but the number of entries is set to 8.

12) it already contained this value before entering this part of the code

13) for TestSmall the values are:
   found_records=301,036
   read_time=6,127
   worst_seeks=18,382.
For TestBig the values are:
   found_records=10,000,000
   read_time=211,613
   worst_seeks=634,840.
And, as in the other posts before, I omitted fractional parts.

14) 14 is the number of all columns for the indexes defined

15) keys_to_use is given to this function as a parameter down from make_join_statistics(), where it is JOIN_TAB->const_keys

16) usable_keys is a parameter from the calling code in SQL_SELECT::test_quick_select(), where it is TABLE->covering_keys. And in the header-file table.h you will find this description:

 /* 
    Map of keys that can be used to retrieve all data from this table 
    needed by the query without reading the row.
  */
  key_map covering_keys;

17) in this case the values are 204922 (for index 3 = Ind2), 455132 (for index 5 = Hersteller) and 380971 (for index 7 = HerstPzn)

Thursday, 14 May 2015

optimizer: indexes

environment

counters

structs

code

correctness