Friday 16 January 2015

optimizer in MySQL

This is just a repetition of an older post. In the older post I inspected the optimizer-stage within MariaDB for a SQL-statement where the optimizer chooses a slow path for executing this statement. I showed where the cost of each possible path is calculated and where the comparison (=the decision for the slower path) is made.

This was all done for MariaDB. So in this post I will dive into the code of MySQL and show where the calculations and decisions will happen here.

Management summary: it's similar but not identical.

This description is based on MySQL 5.6.22, the current stable version. For other versions of MySQL the code can change and my descriptions may be invalid.


some SQL

For this test I used the same tables with identical structures, data and indexes as in the older post.
So let's go into the details in MySQL. This is the statement I want to use:
mysql> use TestOpt;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (29,57 sec)

mysql> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ALL
possible_keys: PZN
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000000
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ref
possible_keys: PZN
          key: PZN
      key_len: 8
          ref: TestOpt.B.PZN
         rows: 1
        Extra: Using where
2 rows in set (0,01 sec)

mysql> 

This is the standard-behaviour of the optimizer. So I used a little bit of force and helped the optimizer to choose another plan:
mysql> select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0,64 sec)

mysql> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ALL
possible_keys: PZN
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 301036
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: PZN
          key: PZN
      key_len: 8
          ref: TestOpt.A.PZN
         rows: 448
        Extra: Using where
2 rows in set (0,01 sec)

mysql> 
In both cases I included the EXPLAIN of the statement.

By comparing the results of the optimizer in MySQL with the result of MariaDB you can see that in both cases the slower path was chosen for the standard SQL-statement, both optimizer have chosen the wrong path.

a remark: do not compare these number and say: XXXXXXX is faster than YYYYY. The reaon for this coul be based on different compile-time options, settings of some server-options etc, and I didn't check these. I'm sure both server can be optimized by modifying settings and options but that's not my topic here. I want to show where and how the optimizer makes the decision. And as I described this for MariaDB before, so in this post I want to show this for MySQL. I want to look into the code not to show improvements.


the code

Instead of long explanations here is the function-hierarchy for optimizing this SQL-statement in MySQL 5.6.22:
mysql_execute_command()
    execute_sqlcom_select()
        handle_select()
            mysql_select()
                mysql_execute_select()
                    JOIN::optimize()   
                        Prepared_stmt_arena_holder::Prepared_stmt_arena_holder()
                        simplify_joins()
                        record_join_nest_info()
                        optimize_cond()
                        JOIN::optimize_fts_limit_query()     
                        opt_sum_query()
                        make_join_statistics()
                            update_ref_and_keys()
                                add_key_fields()
                                add_key_part()
                            optimize_keyuse()
                            Optimize_table_order::choose_table_order()
                                Optimize_table_order::greedy_search()
                                    Optimize_table_order::best_extension_by_limited_search()
                                        Optimize_table_order::best_access_path()
                                        Optimize_table_order::best_extension_by_limited_search()
                                            Optimize_table_order::best_access_path()
                                            Optimize_table_order::consider_plan()
                                        Optimize_table_order::best_access_path()
                                        Optimize_table_order::best_extension_by_limited_search()
                                            Optimize_table_order::best_access_path()
                                            Optimize_table_order::consider_plan()
                            JOIN::get_best_combination()
                        substitute_for_best_equal_field()
                        Item_cond::update_used_tables()
                        JOIN::set_access_methods()
                        make_join_select() 
                        make_join_readinfo()
                        pick_table_access_method()
                    JOIN::exec()
                        JOIN::prepare_result()
                        do_select()
                            sub_select()
                                join_init_read_record()
To say it short: looks a bit different than in MariaDB. But it's not so much different.


cost-values

Let's restrict our view to the computation of the cost-value of a path. This computation is done in the function Optimize_table_order::best_extension_by_limited_search() (in sql/sql_planner.cc):
      /* Compute the cost of extending the plan with 's' */
      current_record_count= record_count * position->records_read;
      current_read_time=    read_time
                            + position->read_time
                            + current_record_count * ROW_EVALUATE_COST;

The strategy for the computation of a cost of a plan is the same as described for MariaDB: start with a table, calculate the cost for reading this table and then extend this plan by adding the next table (and then continue with adding the next table and so on). In our case here this means that the computation starts with the table TestSmall and computes the cost for this step1):
TestSmall:
      current_record_count= 301,036 = 1 * 301,036
      current_read_time   = 66,334 = 0 + 6,127 + 301,036 * 0.20

Then this plan will be extended by adding the table TestBig with this result:
TestBig:
      current_record_count= 134,864,128 = 301,036 * 448
      current_read_time   = 161,903,288 = 66,334 + 134,864,128 + 134,864,128 * 0.20
So the cost of this plan is calculated to a value of 161,903,288. As this is the first plan so it's the best plan now and the values and the plan are kept.

Now the computation starts with the alternate plan:
TestBig:
    current_record_count= 10,000,000 = 1 * 10,000,000
    current_read_time   = 2,211,613 =  0 + 211,613 + 10,000,000 * 0.20
extend by TestSmall:
    current_record_count= 10,000,000 = 10,000,000 * 1
    current_read_time   = 14,211,613 =  2,211,613 + 10,000,000 + 10,000,000 * 0.20
The cost of this alternate plan is calculated to a value of 14,211,613. Above I included both steps of this computation in one box.

If you look at the numbers and compare them with the numbers in the older post you see that the numbers are identical or differ only slightly. And the meaning of the numbers is the same, so I omit these explanations.


the decision

In MariaDB 10.0.10 the comparison of the best plan so far and the newly computed plan (precise: the cost-values of these plans) is done in the function best_extension_by_limited_search() (in sql/sql_select.cc):
        if (current_read_time < join->best_read)
        {
          memcpy((uchar*) join->best_positions, (uchar*) join->positions,
                 sizeof(POSITION) * (idx + 1));
          join->record_count= partial_join_cardinality;
          join->best_read= current_read_time - 0.001;
        }

In MySQL 5.6.22 things are a bit different: in Optimize_table_order::best_extension_by_limited_search() (in sql/sql_planner.cc) you will find these lines of code:
      else  //if ((current_search_depth > 1) && ...
      {
        consider_plan(idx, current_record_count, current_read_time,
                      &trace_one_table);
which simply calls the function Optimize_table_order::consider_plan() (in sql/sql_planner.cc):
  const bool chosen= read_time < join->best_read;
  trace_obj->add("chosen", chosen);
  if (chosen)
  {
    memcpy((uchar*) join->best_positions, (uchar*) join->positions,
            sizeof(POSITION) * (idx + 1));

    /*
      If many plans have identical cost, which one will be used
      depends on how compiler optimizes floating-point calculations.
      this fix adds repeatability to the optimizer.
      (Similar code in best_extension_by_li...)
    */
    join->best_read= read_time - 0.001;
    join->best_rowcount= (ha_rows)record_count;
  }

If you take the numbers shown above you will see that the variable chosen will be evaluated to true (because 14 mio. < 161 mio.) so the second plan is taken.


code changes

By comparing the sources of the two products one can see a lot of similarities. In the old post I compared the results (and only these) with MySQL 5.5.8. When I look into the code of this version I can see almost identical software to MariaDB 10.0.102). This differs a bit with MySQL 5.6.22. In the newer version the programmers at Oracle started to rearrange source-files, introduced new classes like Optimize_table_order, moved exisiting code to some member-functions of this class etc.

So the code is is diverging, but it still has a lot of similarities. But it looks like this will change.

I don't want to judge the changes made. In both products the software has evolved over time, together with these rearrangements I expect more deviations to come.

reviewing the numbers

If you compare the cost-values computed and the time spent for executing a statement you will see a big difference: the optimizer prefers the combination TestBig->TestSmall by a big difference (factor 11 approx.), whereas in practice the opposite plan is much, much faster (factor 40 approx.). Obviously something went wrong in the calculations.

The calculations are based mainly on two numbers:
  • how long does it take to read a table sequentially?
  • how many records have to be read via an index?
Reading a table by a full table-scan is computed to a value of 6,127 (table TestSmall) and 211,613 (table TestgBig). These numbers are returned by the function scan_time(). As our tables are of type MyISAM and this function is not implemented in this storage-engine the function in the base-class is called and it looks like this:
  virtual double scan_time()
  { return ulonglong2double(stats.data_file_length) / IO_SIZE + 2; }
So the length of the MYD-file is taken for computing these numbers3) 4).

For accessing the corresponding records via the index this formula is used: take the number of records in table A and multiply it with the expected number of records in table B for one entry in the index. In numbers this means:
  • 134,864,128 = 301,036 * 448 = (number of records in TestSmall) * (expected number of records for one entry in index in TestBig)
  • 10,000,000 = 10,000,000 * 1 = (number of records in TestBig) * (expected number of records for one entry in index in TestSmall)
and this part of the computation results in the big difference in the cost-values.

Looking at the time needed for the two statements it look like the balance between cost of a table-scan and cost of an index-access is disturbed. This maybe different in other storage-engines. Maybe it was in balance in the past but at least on my machine here it's totally wrong. More research on this is needed.

correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.


some notes:
1) as before: the calculations are done as floating-point numbers, but I will present only the integer-part.
2) please restrict this statement to the optimizer-part of the software. I didn't inspect the whole code-base.
3) IO_SIZE has a value of 4,066, the file-sizes are 25,090,360 (=table TestSmall) and 866,761,264 (=table TestBig).
4) other storage-engines may handle this differently.