Saturday 24 January 2015

optimizer continued

I would like to continue my playing with the optimizer. For this and coming postings I will go back to MariaDB.

environment

The environment is the same as before, you will find the details here: JOIN. The database, tables, data in the tables etc. are the same as before. The only modifications here will be the indexes on the tables.

modifications

In the tests before I had an index on the column PZN of both tables and presented the results of a SQL-statement, in particular I was interested in the decision of the optimizer for one special execution-plan. In another posst I added an index on the column Hersteller on both tables and looked at the optimizer, how this additional information influences the decision.
Now I want to continue playing with indexes on these tables and look what the server does with my SQL-statement. So let me create some indexes:
MariaDB [(none)]> use TestOpt;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [TestOpt]> create index Hersteller on TestSmall(Hersteller);
Query OK, 301036 rows affected (2.12 sec)
Records: 301036  Duplicates: 0  Warnings: 0

MariaDB [TestOpt]> create index Hersteller on TestBig(Hersteller);
Query OK, 10000000 rows affected (1 min 28.10 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [TestOpt]> show index from TestSmall;
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestSmall |          1 | Hersteller |            1 | Hersteller  | A         |        2813 |     NULL | NULL   | YES  | BTREE      |         |               |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)

MariaDB [TestOpt]> show index from TestBig;  
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestBig |          1 | Hersteller |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.01 sec)

MariaDB [TestOpt]> 

Please do not search for a meaning in this environment or statement or ...., it's just a game in a sandbox.

results

With this data let's execute our standard SQL-statement.
Here is the statement using the JOIN-keyword:
MariaDB [TestOpt]> select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (2 min 0.25 sec)

MariaDB [TestOpt]> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 30
        Extra: Using index condition
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 458262
        Extra: Using index condition; Using where
2 rows in set (0.00 sec)

MariaDB [TestOpt]> 
Looking at the EXPLAIN it tells me that the server starts executing this statement with the table TestSmall and uses the index Hersteller to fetch records with the condition A.Hersteller = '00020'. It expects to find 30 records1) in this table. For each record fetched it switches over to the table TestBig and reads records via the index Hersteller with the condition B.Hersteller = '36367'. The optimizer expects to find approx. 450,000 records1) in this table. It then tries to match the JOIN-condition.

What about the other way? In the statement above the server starts the execution with the table TestSmall and from this it switches over to TestBig. Let's try it the other way, start with TestBig and switch over to TestSmall, so here is the statement using STRAIGHT_JOIN:
MariaDB [TestOpt]> select SQL_NO_CACHE count(B.PZN) from TestBig B straight_join TestSmall A on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(B.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (1 min 54.42 sec)

MariaDB [TestOpt]> explain select SQL_NO_CACHE count(B.PZN) from TestBig B straight_join TestSmall A on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 458262
        Extra: Using index condition
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 30
        Extra: Using index condition; Using where
2 rows in set (0.00 sec)

MariaDB [TestOpt]> 
Looking at the EXPLAIN you can see that it indeed started with table TestBig (= B).

The difference in execution-time is about 5%, but starting with the much bigger table is indeed faster. So let's look into the decision-making code, aka the optimizer.

optimizer

The question is: when the optimizer analyzes the statement and the possible plans where and why does it choose the wrong path?.

I've already posted some descriptions of the code of the optimizer2) and I don't want to repeat this here. And in this case the difference is not so big3) so I omit a lot of code-excerpts here. Let me jump into the calculations:
      /* Compute the cost of extending the plan with 's' */

      current_record_count= record_count * position->records_read;
      current_read_time=read_time + position->read_time +
                        current_record_count / (double) TIME_FOR_COMPARE;
You will find this code-snippet in the file sql/sql_select.cc in the function best_extension_by_limited_search().

Here are the results of this computation for the first plan inspected:
Testmall:
current_record_count= 30
current_read_time   = 37
TestSmall -> TestBig:
current_record_count= 13,747,860
current_read_time   = 16,497,499

The optimizer starts with the table Testsmall and estimates the number of records as 30. The cost for accessing these records is calculated to 37. Then it extends the plan by adding the table TestBig and computes the cost for this plan as 16,497,499. Done with this plan. It then keeps this plan as this plan is the best plan so far:
        if (current_read_time < join->best_read)
        {
          memcpy((uchar*) join->best_positions, (uchar*) join->positions,
                 sizeof(POSITION) * (idx + 1));
          join->record_count= partial_join_cardinality;
          join->best_read= current_read_time - 0.001;
        }
join->best_positions contains the best plan so far; after all computations and comparisons are done it contains the best plan.

Now it switches to the second plan which starts with the table TestBig:
TestBig:
current_record_count= 458,262
current_read_time   = 549,915

In the first step it's a big number: 549,915. So the decision is to stop further examining this plan:
      /* Expand only partial plans with lower cost than the best QEP so far */
      if (current_read_time >= join->best_read)
The if() expands to: if ( 549915 >= 30 ) and this expression is true so the code following in curly braces is executed: further inspection of this plan is stopped.

As no more possibilities exist the optimizer stops and the execution of the plan found starts.

So we've seen the cost of the complete plan starting with TestSmall, but not the cost of the complete plan starting with TestBig. Let's rewrite the statement so that the order of the tables is switched:
MariaDB [TestOpt]> explain select SQL_NO_CACHE count(B.PZN) from TestBig B join TestSmall A on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 30
        Extra: Using index condition
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 458262
        Extra: Using index condition; Using where
2 rows in set (0.01 sec)

MariaDB [TestOpt]> 

I've showed directly the EXPLAIN. In this listing you will see that it still starts the execution with the table TestSmall (= A), as before. Stepping into the code you will see that the calculation of costs indeed start with TestSmall and the plan that starts with TestBig is only partially inspected, same as before.

So why this?

ordering of tables

Before starting with the inspection of possible plans the optimizer-code rearranges the tables it has to inspect: it sorts them. By looking into the function choose_plan() you will see this function-call:
  my_qsort2(join->best_ref + join->const_tables,
            join->table_count - join->const_tables, sizeof(JOIN_TAB*),
            jtab_sort_func, (void*)join->emb_sjm_nest);
The function my_qsort2() is an extended version of the qsort-algorithm with some treatment of special cases. You will find the code in mysys/mf_qsort.c.

The variable jtab_sort_func contains the address of the comparison-function. In the case of the SQL-statement used here it's the address of the function join_tab_cmp()4). In this function the comparison is done based on the estimated number of records in the table, so the table TestSmall with it's number of (estimated) 30 records to fetch will come first. After modifying the function join_tab_cmp() a bit to return a different (=wrong) answer the table-order will not be changed. Now we can look into the cost-calculation again and see the costs for both possible paths computed up to the end. These are the values:
plan #1:
TestBig:
current_record_count= 458,262
current_read_time   = 549,915
TestBig -> TestSmall:
current_record_count= 13,747,860
current_read_time   = 17,505,609

plan #2:
TestSmall:
current_record_count= 30
current_read_time   = 37
TestSmall -> TestBig:
current_record_count= 13,747,860
current_read_time   = 16,497,499

After the computation of the second plan is finished the resulting valuess are compared. Some lines above I presented some code-lines:
        if (current_read_time < join->best_read)  
With the values given above this translates into
        if ( 16497499 < 17505609 )
This computes to true and explains why the plan starting with table TestSmall is chosen.

verifying

All the comments above are based on inspecting the code during execution-time. Let's look at it from a different point of view for verifying these statements.

First I want to tell the facts:

TestSmall:      60 records with Hersteller = '00020'
TestBig:   461,016 records with Hersteller = '36367'    


In the post table-scan or index-scan I introduced some counter-variables and I want to use them to verify what I've written in this post so far.

So let's start with the statement using a JOIN. With the counter-variables added to the MyISAM-engine and the statement executed the log-file contained these lines:
case JOIN:
table = <testsmall> counter table-scan: 0   index-scan: 61         = 60 + 1
table = <testbig>   counter table-scan: 0   index-scan: 27,661,020 = 60 * (461,016+1)
In the case of the statement with the STRAIGHT_JOIN I found these lines:
case STRAIGHT_JOIN:
table = <testbig>   counter table-scan: 0   index-scan: 461,017    = 461,016 + 1
table = <testsmall> counter table-scan: 0   index-scan: 28,121,976 = 461,016 * (60+1)
I've modified the output a bit to show where the numbers come from.

You see that no table-scan was done, the index is used instead. In the JOIN-case the server fetched 60 records from TestSmall and for every record fetched it fetches a lot of records from the table TestBig (including the fetch that detects the end-condition). For the STRAIGHT_JOIN-case it fetches record after record from TestBig and for every record fetched it fetches 60 records (plus one fetch for the end-condition) from TestSmall.

decision

Was the decision of the optimizer correct? Looking at the amount of time needed for executing the two statements one could clearly say: no. But if you look at the number of records fetched in both cases and compare these you will see that these numbers prefer the JOIN-case (the plan TestSmall -> TestBig).

The decision is based on estimations (to reduce the time needed for making the decision). And it's always the same code that will be executed on a variety of machines with different CPUs and hard-disk(s) of different speed. So one cannot expect to find the best execution-plan on all machines of this world. In the case presented here the 'not-so-good-plan' is chosen but the difference to the 'good plan' is negligible.


correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.


some notes:
1) you will find a description on how this number is estimated here. The number differs a bit because the index was dropped and recreated and dropped and recreated and .....
2) you will find the descriptions here and here.
3) here you will find a much bigger discrepancy: optimizer aka JOIN (4).
4) there are other compariosn-functions too. If you look at the code of the given function you will find the other functions beneath it.

optimizer in MySQL (2)

this is a continuation of the topic of the previou post. In the previous post I showed how the optimizer in MySQL handled a special SQL-statement and chooses the wrong path (I also showed this for MariaDB). In the case of MariaDB I played a bit with the tables and added some indexes and showed how this changed the behaviour of the optimizer in MariaDB. In this post I want to do the same and show how the SQL-statement will be treated in MySQL. The environment will be the same as before.

index on Hersteller

As I did in MariaDB I will add an index on the column Hersteller on both tables:
mysql> use TestOpt;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create index Hersteller on TestSmall(Hersteller);
Query OK, 301036 rows affected (4,00 sec)
Records: 301036  Duplicates: 0  Warnings: 0

mysql> show index from TestSmall;
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestSmall |          1 | PZN        |            1 | PZN         | A         |      301036 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestSmall |          1 | Hersteller |            1 | Hersteller  | A         |        2813 |     NULL | NULL   | YES  | BTREE      |         |               |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0,00 sec)

mysql> create index Hersteller on TestBig(Hersteller);
Query OK, 10000000 rows affected (2 min 58,16 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

mysql> show index from TestBig;
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestBig |          1 | PZN        |            1 | PZN         | A         |       22321 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Hersteller |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0,00 sec)

mysql> 

Now the database-server has some more information for finding a good path and indeed MySQL uses these indexes for executing the statement (same as in MariaDB). Here is the result (including the EXPLAIN):
mysql> select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0,02 sec)

mysql> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: PZN,Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 33
        Extra: Using index condition; Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: PZN,Hersteller
          key: PZN
      key_len: 8
          ref: TestOpt.A.PZN
         rows: 448
        Extra: Using where
2 rows in set (0,00 sec)

mysql>

As you can see the execution-time is very short. The execution starts with the table TestSmall (=A) and the server fetches the records with the condition Hersteller = '00020' using the index on Hersteller. For every record found it searches the corresponding record in the table TestBig (=B) using the index PZN on TestBig and searches for all records with the value given in the column PZN of the record of TestSmall. On the records found in TestBig the WHERE-condition is applied.

The result looks as expected. So why and how has the server chosen this plan?

function-hierarchy

Let's look into the code of MySQL 5.6.22 and restrict our view to the optimizer-stage:
mysql_execute_command()
    execute_sqlcom_select()
        handle_select()
            mysql_select()
                mysql_prepare_select()
                    JOIN::prepare()
                        setup_tables_and_check_access()
                            setup_tables()
                                setup_natural_join_row_types()
                        setup_fields()
                mysql_execute_select()
                    JOIN::optimize()
                        simplify_joins()
                            simplify_joins()
                        optimize_cond()
                            build_equal_items()
                            propagate_cond_constants()
                            remove_eq_conds()
                                internal_remove_eq_conds()
                       opt_sum_query()
                       make_join_statistics()
                           update_ref_and_keys()
                               add_key_fields()
                                   add_key_field()
                               add_key_part()
                           make_select()                                              twice: TestSmall, TestBig
                           get_quick_record_count()                                   twice: TestSmall, TestBig
                               SQL_SELECT::test_quick_select()
                                   get_best_group_min_max()
                                   get_key_scans_params()
                                       check_quick_select()
                                           eq_ranges_exceeds_limit()
                                           ha_myisam::multi_range_read_info_const()
                                               DsMrr_impl::dsmrr_info_const()
                                                   handler::multi_range_read_info_const()
                                                       ha_myisam::records_in_range()
                                                           mi_records_in_range()         for min_key and max_key
                                                               _mi_record_pos()
                                                                   _mi_pack_key()
                                                                   _mi_search_pos()      recursively
                                                                       _mi_fetch_keypage()
                                                                           key_cache_read()
                                                                               find_key_block()
                           optimize_keyuse()
                           Optimize_table_order::Optimize_table_order()
                           Optimize_table_order::choose_table_order()
                               Optimize_table_order::greedy_search()
                                   Optimize_table_order::best_extension_by_limited_search()
                                       Optimize_table_order::best_access_path()
                                       Optimize_table_order::best_extension_by_limited_search()
                                           Optimize_table_order::best_access_path()
                                           Optimize_table_order::consider_plan()
                                       Optimize_table_order::best_access_path()
                           JOIN::refine_best_rowcount()
                           JOIN::get_best_combination()
                       substitute_for_best_equal_field()
                           eliminate_item_equal()
                       Item_cond::update_used_tables()
                       JOIN::set_access_methods()
                       update_depend_map()
                       make_join_select()
                           make_cond_for_table()
                               make_cond_for_table_from_pred()
                           pushdown_on_conditions()
                       make_join_readinfo()
                           push_index_cond()
                               make_cond_for_index()
                           setup_join_buffering()
                       ha_make_pushed_joins()
                           plugin_foreach_with_mask()
                       pick_table_access_method()
                    JOIN::exec()           
                        do_select()
                            sub_select()
                                join_read_always_key()
                                     JOIN_TAB::use_order()
                                    handler::ha_index_init()
                                        ha_myisam::index_init()
                                    handler::ha_index_read_map()
                                        ha_myisam::index_read_map()
                                join_read_next_same()
                                    handler::ha_index_next_same()
                                        ha_myisam::index_next_same()
and so on record after record

some hints

The hierarchy shown above is valid for the SQL-statement used in this post. For other SQL-statements this can differ and if you look of the last post you will indeed see that this hierarchy is similar but not identical.

I hope the hierarchy shown is correct. I omitted some functions e.g. those who were entered and after a single check the code returns to the caller. In the case of a recursion I did include only the first call (for example _mi_search_pos()). Some functions are called twice, because the SQL-statement references two tables so this function and every function beneath it is called once for each table, I didn't include the second call.
As humans make errors maybe I made an error or overlooked a call. If you find such an error please drop me a line.

This looks similar to the code I presented for MariaDB 10.0.10 except for some rearrangement of code, some classes introduced in this version of MySQL and some minor modification which I already mentioned in the former post.

calculations

The calculation of the cost associated to a plan is almost identical to the calculation in MariaDB. So I will only give some hints, you can find the details in the other post.

For the plan currently under consideration the cost will be calculated in Optimize_table_order::best_extension_by_limited_search():
      /* Compute the cost of extending the plan with 's' */
      current_record_count= record_count * position->records_read;
      current_read_time=    read_time
                            + position->read_time
                            + current_record_count * ROW_EVALUATE_COST;

In the first iteration the optimizer starts with the table TestSmall, computes the cost for accessing this table and then extends this plan by accessing the table TestBig. So the cost for this plan is evaluated via:1)
first try: TestSmall --> TestBig
current_record_count= 14,784 = 33 * 448
current_read_time=    17,780 =  39 + 14,784 + 14,784 * 0.20                            

The next possible plan starts with the table TestBig, the cost for accessing this table is computed and the plan is extended by the table TestSmall. Now these costs are:
second try: TestBig --> TestSmall
current_record_count= 455,132 = 1 * 455,132
current_read_time=    546,158 = 0 + 455,132 + 455,132 * 0.20

Let me omit the comparison of the costs of both plans for some seconds. The numbers shown here are almost identical to the numbers in the post about the optimizer in MariaDB. There are some minor differences which I would like to explain. You have seen the number 39 some lines above and if you look at the numbers in MariaDB you will find a number 40 in the same position. In the case of MySQL this value is computed via 0 + 33 + 33*0.20, in the case of MariaDB this value is computed via the formula 0 + (33+1) + 33/5, they differ simply by a value of 1.
The same statement is valid for the value 17,780 in MySQL and 17,814 in MariaDB. In MySQL this value is computed via 39 + 14,784 + 14,784 * 0.20, in MariaDB as 40 + 14,817 + 14,784 / 5.
Other values like 33 and 455,132 are computed identical and have the same meaning as described in the other post (please look for the topics numbers and more numbers).

decision

After calculating the cost of the first plan this value and the plan is given to the function Optimize_table_order::consider_plan(). There the code looks like this:
  const bool chosen= read_time < join->best_read;
  trace_obj->add("chosen", chosen);
  if (chosen)
  {
As the best plan so far has not been computed (value join->best_read contains the initial value DBL_MAX) and the cost just computed is 17,780 so this plan is taken.

In the next iteration the cost of the other plan is computed and compared against the best plan so far. You will find the comparison in Optimize_table_order::best_extension_by_limited_search():
      /* Expand only partial plans with lower cost than the best QEP so far */
      if (current_read_time >= join->best_read)
      {
which translates into
      if ( 546,158 >= 17,780 )
      {
so the code in the braces is executed which means that this plan is no longer inspected.

Normally the next iteration will start with a different plan but there are no mor possibilities so the search for the best plan finishes here.

the wrong path

In the older post I showed why the optimizer has chosen the wrong path, in this case it has chosen a fast path. But what about the other path? Let's use some force and look at the result by executing this path:
mysql> select SQL_NO_CACHE count(B.PZN) from TestBig B straight_join TestSmall A on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(B.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (11,53 sec)

mysql> explain select SQL_NO_CACHE count(B.PZN) from TestBig B straight_join TestSmall A on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: PZN,Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 455132
        Extra: Using index condition; Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: PZN,Hersteller
          key: PZN
      key_len: 8
          ref: TestOpt.B.PZN
         rows: 1
        Extra: Using where
2 rows in set (0,00 sec)

mysql> 
The number is convincing, the optimizer has chosen the better path.


correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.


some notes:
1) as I did in the other posts: most of the calculations are done as floating-point numbers, but I will present only the integer-part.

Friday 16 January 2015

optimizer in MySQL

This is just a repetition of an older post. In the older post I inspected the optimizer-stage within MariaDB for a SQL-statement where the optimizer chooses a slow path for executing this statement. I showed where the cost of each possible path is calculated and where the comparison (=the decision for the slower path) is made.

This was all done for MariaDB. So in this post I will dive into the code of MySQL and show where the calculations and decisions will happen here.

Management summary: it's similar but not identical.

This description is based on MySQL 5.6.22, the current stable version. For other versions of MySQL the code can change and my descriptions may be invalid.


some SQL

For this test I used the same tables with identical structures, data and indexes as in the older post.
So let's go into the details in MySQL. This is the statement I want to use:
mysql> use TestOpt;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (29,57 sec)

mysql> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ALL
possible_keys: PZN
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000000
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: PZN
          key: PZN
      key_len: 8
          ref: TestOpt.B.PZN
         rows: 1
        Extra: Using where
2 rows in set (0,01 sec)

mysql> 

This is the standard-behaviour of the optimizer. So I used a little bit of force and helped the optimizer to choose another plan:
mysql> select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0,64 sec)

mysql> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ALL
possible_keys: PZN
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 301036
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: PZN
          key: PZN
      key_len: 8
          ref: TestOpt.A.PZN
         rows: 448
        Extra: Using where
2 rows in set (0,01 sec)

mysql> 
In both cases I included the EXPLAIN of the statement.

By comparing the results of the optimizer in MySQL with the result of MariaDB you can see that in both cases the slower path was chosen for the standard SQL-statement, both optimizer have chosen the wrong path.

a remark: do not compare these number and say: XXXXXXX is faster than YYYYY. The reaon for this coul be based on different compile-time options, settings of some server-options etc, and I didn't check these. I'm sure both server can be optimized by modifying settings and options but that's not my topic here. I want to show where and how the optimizer makes the decision. And as I described this for MariaDB before, so in this post I want to show this for MySQL. I want to look into the code not to show improvements.


the code

Instead of long explanations here is the function-hierarchy for optimizing this SQL-statement in MySQL 5.6.22:
mysql_execute_command()
    execute_sqlcom_select()
        handle_select()
            mysql_select()
                mysql_execute_select()
                    JOIN::optimize()   
                        Prepared_stmt_arena_holder::Prepared_stmt_arena_holder()
                        simplify_joins()
                        record_join_nest_info()
                        optimize_cond()
                        JOIN::optimize_fts_limit_query()     
                        opt_sum_query()
                        make_join_statistics()
                            update_ref_and_keys()
                                add_key_fields()
                                add_key_part()
                            optimize_keyuse()
                            Optimize_table_order::choose_table_order()
                                Optimize_table_order::greedy_search()
                                    Optimize_table_order::best_extension_by_limited_search()
                                        Optimize_table_order::best_access_path()
                                        Optimize_table_order::best_extension_by_limited_search()
                                            Optimize_table_order::best_access_path()
                                            Optimize_table_order::consider_plan()
                                        Optimize_table_order::best_access_path()
                                        Optimize_table_order::best_extension_by_limited_search()
                                            Optimize_table_order::best_access_path()
                                            Optimize_table_order::consider_plan()
                            JOIN::get_best_combination()
                        substitute_for_best_equal_field()
                        Item_cond::update_used_tables()
                        JOIN::set_access_methods()
                        make_join_select() 
                        make_join_readinfo()
                        pick_table_access_method()
                    JOIN::exec()
                        JOIN::prepare_result()
                        do_select()
                            sub_select()
                                join_init_read_record()
To say it short: looks a bit different than in MariaDB. But it's not so much different.


cost-values

Let's restrict our view to the computation of the cost-value of a path. This computation is done in the function Optimize_table_order::best_extension_by_limited_search() (in sql/sql_planner.cc):
      /* Compute the cost of extending the plan with 's' */
      current_record_count= record_count * position->records_read;
      current_read_time=    read_time
                            + position->read_time
                            + current_record_count * ROW_EVALUATE_COST;

The strategy for the computation of a cost of a plan is the same as described for MariaDB: start with a table, calculate the cost for reading this table and then extend this plan by adding the next table (and then continue with adding the next table and so on). In our case here this means that the computation starts with the table TestSmall and computes the cost for this step1):
TestSmall:
      current_record_count= 301,036 = 1 * 301,036
      current_read_time   = 66,334 = 0 + 6,127 + 301,036 * 0.20

Then this plan will be extended by adding the table TestBig with this result:
TestBig:
      current_record_count= 134,864,128 = 301,036 * 448
      current_read_time   = 161,903,288 = 66,334 + 134,864,128 + 134,864,128 * 0.20
So the cost of this plan is calculated to a value of 161,903,288. As this is the first plan so it's the best plan now and the values and the plan are kept.

Now the computation starts with the alternate plan:
TestBig:
    current_record_count= 10,000,000 = 1 * 10,000,000
    current_read_time   = 2,211,613 =  0 + 211,613 + 10,000,000 * 0.20
extend by TestSmall:
    current_record_count= 10,000,000 = 10,000,000 * 1
    current_read_time   = 14,211,613 =  2,211,613 + 10,000,000 + 10,000,000 * 0.20
The cost of this alternate plan is calculated to a value of 14,211,613. Above I included both steps of this computation in one box.

If you look at the numbers and compare them with the numbers in the older post you see that the numbers are identical or differ only slightly. And the meaning of the numbers is the same, so I omit these explanations.


the decision

In MariaDB 10.0.10 the comparison of the best plan so far and the newly computed plan (precise: the cost-values of these plans) is done in the function best_extension_by_limited_search() (in sql/sql_select.cc):
        if (current_read_time < join->best_read)
        {
          memcpy((uchar*) join->best_positions, (uchar*) join->positions,
                 sizeof(POSITION) * (idx + 1));
          join->record_count= partial_join_cardinality;
          join->best_read= current_read_time - 0.001;
        }

In MySQL 5.6.22 things are a bit different: in Optimize_table_order::best_extension_by_limited_search() (in sql/sql_planner.cc) you will find these lines of code:
      else  //if ((current_search_depth > 1) && ...
      {
        consider_plan(idx, current_record_count, current_read_time,
                      &trace_one_table);
which simply calls the function Optimize_table_order::consider_plan() (in sql/sql_planner.cc):
  const bool chosen= read_time < join->best_read;
  trace_obj->add("chosen", chosen);
  if (chosen)
  {
    memcpy((uchar*) join->best_positions, (uchar*) join->positions,
            sizeof(POSITION) * (idx + 1));

    /*
      If many plans have identical cost, which one will be used
      depends on how compiler optimizes floating-point calculations.
      this fix adds repeatability to the optimizer.
      (Similar code in best_extension_by_li...)
    */
    join->best_read= read_time - 0.001;
    join->best_rowcount= (ha_rows)record_count;
  }

If you take the numbers shown above you will see that the variable chosen will be evaluated to true (because 14 mio. < 161 mio.) so the second plan is taken.


code changes

By comparing the sources of the two products one can see a lot of similarities. In the old post I compared the results (and only these) with MySQL 5.5.8. When I look into the code of this version I can see almost identical software to MariaDB 10.0.102). This differs a bit with MySQL 5.6.22. In the newer version the programmers at Oracle started to rearrange source-files, introduced new classes like Optimize_table_order, moved exisiting code to some member-functions of this class etc.

So the code is is diverging, but it still has a lot of similarities. But it looks like this will change.

I don't want to judge the changes made. In both products the software has evolved over time, together with these rearrangements I expect more deviations to come.

reviewing the numbers

If you compare the cost-values computed and the time spent for executing a statement you will see a big difference: the optimizer prefers the combination TestBig->TestSmall by a big difference (factor 11 approx.), whereas in practice the opposite plan is much, much faster (factor 40 approx.). Obviously something went wrong in the calculations.

The calculations are based mainly on two numbers:
  • how long does it take to read a table sequentially?
  • how many records have to be read via an index?
Reading a table by a full table-scan is computed to a value of 6,127 (table TestSmall) and 211,613 (table TestgBig). These numbers are returned by the function scan_time(). As our tables are of type MyISAM and this function is not implemented in this storage-engine the function in the base-class is called and it looks like this:
  virtual double scan_time()
  { return ulonglong2double(stats.data_file_length) / IO_SIZE + 2; }
So the length of the MYD-file is taken for computing these numbers3) 4).

For accessing the corresponding records via the index this formula is used: take the number of records in table A and multiply it with the expected number of records in table B for one entry in the index. In numbers this means:
  • 134,864,128 = 301,036 * 448 = (number of records in TestSmall) * (expected number of records for one entry in index in TestBig)
  • 10,000,000 = 10,000,000 * 1 = (number of records in TestBig) * (expected number of records for one entry in index in TestSmall)
and this part of the computation results in the big difference in the cost-values.

Looking at the time needed for the two statements it look like the balance between cost of a table-scan and cost of an index-access is disturbed. This maybe different in other storage-engines. Maybe it was in balance in the past but at least on my machine here it's totally wrong. More research on this is needed.

correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.


some notes:
1) as before: the calculations are done as floating-point numbers, but I will present only the integer-part.
2) please restrict this statement to the optimizer-part of the software. I didn't inspect the whole code-base.
3) IO_SIZE has a value of 4,066, the file-sizes are 25,090,360 (=table TestSmall) and 866,761,264 (=table TestBig).
4) other storage-engines may handle this differently.

Thursday 1 January 2015

optimizer (2) aka JOIN (5)

This is my fifth post about the various aspects of this topic: JOIN.

In the former posts I described:
  • what's happening in the storage-engine (in this case: MyISAM) by executing the query: JOIN
  • what's happening in the level above the storage-engine: JOIN (2)
  • after reading a record from a table the WHERE-clause is applied to this record: JOIN (3)
  • where are different paths evaluated and one path chosen: optimizer

Be careful when you generalize the results presented in my posts. A lot of the data presented here depends on the data in the tables, and this is my data. If you do your own tests don't expect to see the same numbers (like values for keys, file-offsets, distribution of keys etc.).

index on Hersteller

Let me continue playing with the statement I've used before so I added indexes on the column Hersteller:
MariaDB [TestOpt]> create index Hersteller on TestSmall(Hersteller);
Query OK, 301036 rows affected (2.85 sec)
Records: 301036  Duplicates: 0  Warnings: 0

MariaDB [TestOpt]> create index Hersteller on TestBig(Hersteller);
Query OK, 10000000 rows affected (2 min 18.44 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [TestOpt]> show index from TestSmall;
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestSmall |          1 | PZN        |            1 | PZN         | A         |      301036 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestSmall |          1 | Hersteller |            1 | Hersteller  | A         |        2813 |     NULL | NULL   | YES  | BTREE      |         |               |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

MariaDB [TestOpt]> show index from TestBig;  
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestBig |          1 | PZN        |            1 | PZN         | A         |       22321 |     NULL | NULL   | YES  | BTREE      |         |               |
| TestBig |          1 | Hersteller |            1 | Hersteller  | A         |        2581 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

MariaDB [TestOpt]> 

With these additional information available the database can execute the statement in a different way (I included the explain):
MariaDB [TestOpt]>MariaDB [TestOpt]> select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0.01 sec)

MariaDB [TestOpt]> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: PZN,Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 33
        Extra: Using index condition; Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: PZN,Hersteller
          key: PZN
      key_len: 8
          ref: TestOpt.A.PZN
         rows: 448
        Extra: Using where
2 rows in set (0.00 sec)

MariaDB [TestOpt]> 
As you can see the execution of this statement is much faster, even faster than the execution of the statement in the last post (20 secs., 0.4 secs.).

If you look at the explain-statement you can see that the optimizer has chosen the table TestSmall (=A) as the driving table and uses the index on Hersteller to search for entries with the value '00020' in this column. For every record found it looks for the corresponding record in the table TestBig (=B) using the value of the column PZN and searching through the index on the column PZN.

The last post inspected the 2 possible paths for executing this query: TestSmall -> TestBig and TestBig -> TestSmall. With these alternatives the optimizer had chosen the path TestBig -> TestSmall because the internal cost-calculations showed this as the bet path (only the index on PZN did exist). By adding the additional index the optimizer chooses the path that starts with TestSmall and I will show how the corresponding costs are calculated.

In the last post I had to use some force to start the execution with the table TestSmall, now let me use the same force to start the execution with the table TestBig:
MariaDB [TestOpt]> select SQL_NO_CACHE count(B.PZN) from TestBig B straight_join TestSmall A on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(B.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (12.68 sec)

MariaDB [TestOpt]> explain select SQL_NO_CACHE count(B.PZN) from TestBig B straight_join TestSmall A on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: PZN,Hersteller
          key: Hersteller
      key_len: 6
          ref: const
         rows: 455132
        Extra: Using index condition; Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: PZN,Hersteller
          key: PZN
      key_len: 8
          ref: TestOpt.B.PZN
         rows: 1
        Extra: Using where
2 rows in set (0.01 sec)

MariaDB [TestOpt]> 
As you can see I've changed the SQL-statement a bit to let the table TestBig appear at the first position and for this to happen I used the keyword STRAIGHT_JOIN (without this it will start the execution with the table TestSmall). The result is clear: this way is sloooow.

It looks like the optimizer has chosen the right ordering of the tables to execute this statement. Let me now describe how this is done. The explanations are valid for MariaDB in the version 10.0.10, other versions can handle these things slightly diffrent. And MySQL may handle this different too.

function hierarchy

So where does the server calculate the cost of the SQL-statement given above? Let me directly jump into the code and show the function-hierarchy in MariaDB:
mysql_execute_command()
    execute_sqlcom_select()
        handle_select()
            mysql_select()
                JOIN::optimize()
                    JOIN::optimize_inner()
                        make_join_statistics()
                            JOIN_TAB::scan_time()
                                handler::scan_time()
                            make_select()
                            update_ref_and_keys()      
                            get_quick_record_count()             // executed twice: TestSmall and TestBig
                                SQL_SELECT::test_quick_select()
                                    get_key_scans_params()
                                        check_quick_select()
                                            ha_myisam::multi_range_read_info_const()            // starting from here it's specific to MyISAM
                                                DsMrr_impl::dsmrr_info_const()
                                                    handler::multi_range_read_info_const()
                                                        ha_myisam::records_in_range()
                                                             mi_records_in_range()
                                                                 _mi_record_pos()
                                                                     _mi_search_pos()           // with pos=0x7a2c00, key='00020'
                                                                         _mi_fetch_keypage()    // with page=root
                                                                 _mi_record_pos()
                                                                     _mi_search_pos()           // with pos=0x7a2c00, key='00020'
                                                                         _mi_fetch_keypage()    // with page=root
                            optimize_semijoin_nests()
                            choose_plan()
                                greedy_search()
                                    best_extension_by_limited_search()
                                        best_access_path()
                                        best_extension_by_limited_search()
                                            best_access_path()
                                        best_access_path()
                        make_join_readinfo()                
                        JOIN::save_explain_data()
                JOIN::exec()
                    JOIN::exec_inner()
                    JOIN::save_explain_data()
a hint: the function get_quick_record_count() (and all functions beneath it) is executed twice, but I didn't include this in the hierarchy.

best plan

I would like to omit all the intermediate steps in calculating the cost of a path but give an overview of the steps involved. The description is valid for the standard SQL-statement, the fast one, not for the statement using the STRAIGHT_JOIN keyword.

The calculation starts with the table TestSmall. The cost for reading this table is calculated and the result is a value of 40. Then this (partial) plan is extended by inspecting the table TestBig. The cost for reading both tables is calculated to the number 17,814. As this plan is completely inspected the computations stops and this value and the plan is stored because it's the best plan so far.

Next the calculation starts from the beginning again but with the table TestBig. Here the cost-value is calculated and the number is a value 546,159. As this number is greater than the value for the plan before the inspection of this plan ist stopped.

Here is the code that does the comparison/decision (in best_extension_by_limited_search() in sql/sql_select.cc):
      /* Expand only partial plans with lower cost than the best QEP so far */
      if (current_read_time >= join->best_read)
      { 
        DBUG_EXECUTE("opt", print_plan(join, idx+1,
                                       current_record_count,
                                       read_time,
                                       current_read_time,
                                       "prune_by_cost"););
        restore_prev_nj_state(s);
        restore_prev_sj_state(remaining_tables, s, idx);
        continue;
      }
join->best_read contains the cost of the best plan so far, current_read_time is the cost of the plan actually inspected, so the expression is:
      if ( 546,159 >= 17,814 )
      { 
        // stop extending this plan!
      }
and this decision is easy.

numbers

Let me do some refinement and explain where these numbers come from.

Here is the code that computes these numbers (in sql/sql_select.cc some lines above the if-statement shown):
      /* Compute the cost of extending the plan with 's' */

      current_record_count= record_count * position->records_read;
      current_read_time=read_time + position->read_time +
                        current_record_count / (double) TIME_FOR_COMPARE;

Please keep in mind that the numbers I will show you soon are estimations, as the calculation of these numbers should be done in a short time.

Here are the numbers for the first step (inspecting table TestSmall alone):
      current_record_count= record_count * position->records_read;
           current_record_count:            33       (1 * 33)              number of records in TestSmall
      current_read_time=read_time + position->read_time + current_record_count / (double) TIME_FOR_COMPARE;
           current_read_time:               40       (0 + 34 + 33 / 5)     
Next we add the table TestBig to our plan and now we get these numbers:
      current_record_count= record_count * position->records_read;
           current_record_count:            14,784   (33 * 448)           number of records in TestSmall * number of records in TestBig
      current_read_time=read_time + position->read_time + current_record_count / (double) TIME_FOR_COMPARE;
           current_read_time:               17,814   (40 + 14,817 + 14,784 / 5)
So the cost for our first plan is computed to a value of 17,814.

Let's switch over to the alternate plan: we will start with the table TestBig and will get these numbers:
      current_record_count= record_count * position->records_read;
           current_record_count:            455,132    (1 * 455,132)    number of records in TestBig
      current_read_time=read_time + position->read_time + current_record_count / (double) TIME_FOR_COMPARE;
           current_read_time:               546,159    (0 + 455,133 + 455,132 / 5)
and the computation stops here, because: 546,159 > 17,814.

By comparing these numbers it's easy to find the best plan. And the optimizer chooses the correct plan as I showed above.

more numbers

There are still some numbers in these computations which I didn't explain until now. These are estimations made from the index-file and as this example is using the MyISAM-engine you will find the indexes in the files TestSmall.MYI and TestBig.MYI. You can find an introduction into the structure of such an index-file here.

The numbers 33 is the estimated number of records in TestSmall with the property Hersteller = '00020', and 455,132 is the number of records in TestBig with the property Hersteller = '36367'. So how are these numbers computed?

If you look at the function-hierarchy above you will see a call of a function in the engine, in this case it's ha_myisam::multi_range_read_info_const(). This code is an engine-specific part in the optimizer, in this case it's MyISAM-specific. In this function 2 values are computed for each table: one value for the first and one value for the last entry in the tree with a given value, and the (estimated) number of records is the diffferenc of these numbers. To dive a bit deeper into the code let's begin with the table TestSmall: the function _mi_search_pos() starts at the root of the tree and it searches for the first occurrence of the value '00020'. This is done by recursively calling _mi_search_pos() again inspecting node after node. When the leaf is reached and the entry is found the code takes the position of the entry within this leaf and divides it by the total number of entries in this leaf. When it returns it does this again and again. In the end a number x is computed with the property: 0 <= x <= 1.0. In the function _mi_record_pos() this number is multiplied with the number of records in this table and this is the estimated position of the entry in the tree.

More details? OK, let's look in my example:
for min_key:
calling _mi_search_pos() with search_flag = HA_READ_KEY_EXACT
  0x7a2c001) (=root)            1 entry,   found 0 (the value in this node is > '00020')
  0x749400                     55 entries, found 0
  0x42b000                     55 entries, found 0
  0x41cc00                     returns 84, found 0 (in this case this describes the first entry)
  _mi_search_pos:    (0 + 1) / (84 + 1)                      ==> 0.011764705882352941
                     (0 + 0.011764705882352941) / (55 + 1)   ==> 0.00021008403361344536
                     (0 + 0.00021008403361344536) / (55 + 1) ==> 3.7515006002400958e-06
                     (0 + 3.7515006002400958e-06) / (1 + 1)  ==> 1.8757503001200479e-06
  _mi_record_pos:    1.8757503001200479e-06 * 301036 + 0.5   ==> 1
So the value for the starting-position is 1 (variable start_pos). Now we need the last occurrence of this value:
for max key:
calling _mi_search_pos() with search_flag = HA_READ_AFTER_KEY
  0x7a2c00 (=root)              1 entry,   found 0
  0x749400                     55 entries, found 0
  0x42b000                     55 entries, found 0
  0x41cc00                     84 entries, found the last value '00020' on position 60
  _mi_search_pos:    (60 + 1) / (84 + 1)                     ==> 0.71764705882352942
                     (0 + 0.71764705882352942) / (55 + 1)    ==> 0.012815126050420168 
                     (0 + 0.012815126050420168) / (55 + 1)   ==> 0.00022884153661464584
                     (0 + 0.00022884153661464584) / (55 + 1) ==> 0.00011442076830732292
  _mi_record_pos:    0.00011442076830732292 * 301036 + 0.5   ==> 34
the value end_point is 34. In the function mi_records_in_range() the estimation results in end_point - start_point = 34 -1 = 33 = estimated number of records in TestSmall with Hersteller = '00020', which is the number you've seen above.

So let's switch over to TestBig and do the comparable computation again:
for min key:
calling _mi_search_pos() with search_flag = HA_READ_KEY_EXACT
  0xfd7d4002) (=root)          37 entries, found '36367' on position 26
  0xdcc1800                    55 entries, found on 25
  0xdb07800                    55 entries, found on 49
  0xdb05800                    84 entries, found on 28
  _mi_search_pos:     (28 + 1) / (84 + 1)                   ==> 0.3411764705882353
                      (49 + 0.3411764705882353) / (55 + 1)  ==> 0.88109243697479001
                      (25 + 0.88109243697479001) / (55 + 1) ==> 0.4621623649459784
                      (26 + 0.4621623649459784) / (37 + 1)  ==> 0.69637269381436784
  _mi_record_pos:     0.69637269381436784 * 10 mio + 0,5    ==> 6,963,727

and for max_key:     
calling _mi_search_pos() with search_flag = HA_READ_AFTER_KEY
  0xfd7d400 (=root)            37 entries, found on 28
  0xe2fe000                    55 entries, found on 10
  0xe06e400                    55 entries, found on 41
  0xe06a400                    84 entries, found the last entry '36367' on position 4
  _mi_search_pos:      (4 + 1) / (84 + 1)                      ==> 0.058823529411764705
                       (41 + 0.058823529411764705) / (55 + 1)  ==> 0.73319327731092443
                       (10 + 0.73319327731092443) / (55 + 1)   ==> 0.19166416566626651
                       (28 + 0.19166416566626651) / (37 + 1)   ==> 0.74188589909648073
  _mi_record_pos:      0.74188589909648073 * 10 mio + 0,5      ==> 7,418,859
These values result in 7,418,859 - 6,963,727 = 455,132 = estimated number of records in TestBig with the property Hersteller = '36367'.

As written before these values are estimations, the exact numbers are 60 (for TestSmall) and 461,016 (for TestBig), so the estimation is not bad. Please keep in mind that the computations should be done fast, getting better estimations means to inspect much more nodes/leafs of the tree which will consume additional time.

The estimated values are stored in the struct TABLE at the end of the function check_quick_select(), there you can find this expression param->table->quick_rows[keynr], which stores the values computed. Later in the function best_access_path() these values are read from table->quick_rows[key] and at the end of this function stored in pos->records_read. Later they appear as shown in the computations above as position->records_read.

Almost done. But there are 3 numbers left in the computation: 34, 455,133 and 14,817. For the first two numbers it is easy to detect: as you guessed it these are the estimated values just explained but incremented by one. This incrementation is don in the function handler::read_time() (found in handler.h). The result of this addition is stored in the variable position->read_time.
The remaining number follows the same schema: 14,817 = 33 * (448 + 1). For every record found in TestSmall there exist (estimated) 448 records in TestBig. These 448 records are read via the index but an additional access is needed for returning HA_ERR_END_OF_FILE. At least this is my interpretation of the incrementation, and thi interpretation also applies to the other 2 numbers. As MyISAM has not implemented this function the function of the base-class hanlder is called, other engines implement this and therefore handle it with their own interpretation.

Now we are through. This is the algorithm that computes the cost-values for the statement given above.

MyISAM

A lot of my comments here in this post are specific for the MyISAM-engine because the tables used for the tests are of type MyISAM. Other storage-engines can handle things differently so don't apply the results presented here schematically to other engines.

And as a second argument a lot of the data presented here depends on the data in the tables, and this is my data. If you do your own tests don't expect to see the same numbers (like values for keys, file-offsets, distribution of keys etc.).

The explanations given here are valid for MariaDB in the version 10.0.10, other versions can handle these things slightly different. And MySQL may handle this different too.

index-files

It looks like adding one or more indexes to a table is a good thing, so why don't we create an index for every column of a table? Please keep in mind that maintaining an index does consume time: inserting a new record means updating all existing indexes for this table, this is also valid for deletes and maybe for updates too. So what you win for (some) selects you may loose in the other operations. Look carefully at what you need.


correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.


some notes:
1) the value 0x7a2c00 is an offset in the file TestSmall.MYI. At this position the index-tree on the column PZN starts, all numbers under this value are the file-offsets where the corresponding nodes or finally a leaf can be found.
2) the value 0xfd7d400 is an offset in the file TestBig.MYI, analog to 1)