an unsystematic journey through the source-code of MySQL and MariaDB: November 2016

Some weeks ago I wanted to leave this topic and look at other parts of the code but I found some queries which looked interesting and therefore I stayed on my route. So I let me start this text with my usual introductory words.

warning

Before I dive into the code let me issue a warning: the functions, hierarchies and data-examples shown here are valid for the environment on my PC. If you do your own tests this may look a bit different.

environment

Again I'm using my usual environment which I already described in one of my older posts. It's the same tables, the same data in the tables etc.. I've changed the SQL-statement and therefore my focus changes.

the SQL-statement

So here is my statement which I want to inspect and describe my results here in this text:

MariaDB [TestOpt]> select SQL_NO_CACHE   B.PZN, B.ArtikelText  from TestBig B where B.PZN in ( select A.PZN from TestSmall A where A.Hersteller = '00020') and B.Hersteller = '36367';
+--------+----------------------------+
| PZN    | ArtikelText                |
+--------+----------------------------+
| 178324 | YPSIFLEX 10CMX4M EL FIXIER |
| 178324 | STIBIUM SULF AURANT D 6    |
| 87886  | GILOFA BAS 40 KNSTR 1KANDI |
| 178324 | SOFTCLIX II LANZETTEN      |
| 87952  | RUTIVISCAL N               |
| 178324 | CROTAMITON                 |
| 87952  | NOBASTRETCH KRAEFT 7MX6CM  |
| 87952  | STECHBECKEN 31CM STAHL     |
| 87923  | LIPP LOVE LUXUS SUN PROT12 |
| 87952  | TASTATURHILFE UNT NC21017  |
| 178324 | HOLLISTER KOL KLEBE F 2178 |
| 87952  | MEDI TEST GLUCOSE          |
| 178324 | ESTOSAN TUBE               |
| 87952  | FRUCHTS DUSCHGEL ORANGE    |
+--------+----------------------------+
14 rows in set (16.24 sec)

MariaDB [TestOpt]>

As you can see I'm using a subselect now and I'm interested in how the whole statement and especially the subselect is treated in the execution-stage of MariaDB¹⁾.

environment

Just one more repetition: this is how the 2 tables look like:

MariaDB [TestOpt]> desc TestSmall;
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| Id          | char(8)      | YES  |     | NULL    |       |
| PZN         | char(7)      | YES  |     | NULL    |       |
| EVP         | decimal(7,2) | YES  |     | NULL    |       |
| HAP         | decimal(7,2) | YES  |     | NULL    |       |
| ArtikelBez  | varchar(40)  | YES  |     | NULL    |       |
| ArtikelText | varchar(26)  | YES  |     | NULL    |       |
| Hersteller  | char(5)      | YES  |     | NULL    |       |
+-------------+--------------+------+-----+---------+-------+
7 rows in set (0.00 sec)

MariaDB [TestOpt]>

The table TestBig has an identical structure; TestSmall contains 301,036 records, TestBig contains 10 mio. records. The data is from an old project but the contents of the data is not of interest here.

There are no indexes defined on these tables. Silly, I know, but the database-server has to handle this situation.

explain

Let's look at what the server tells us how it handles the statement:

MariaDB [TestOpt]> explain select SQL_NO_CACHE   B.PZN, B.ArtikelText  from TestBig B where B.PZN in ( select A.PZN from TestSmall A where A.Hersteller = '00020') and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: B
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000000
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <subquery2>
         type: eq_ref
possible_keys: distinct_key
          key: distinct_key
      key_len: 7
          ref: func
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: A
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 301036
        Extra: Using where
3 rows in set (0.00 sec)
MariaDB [TestOpt]>

Oh, this looks different to the explains of the older examples. As each table is represented internally by an object of type JOIN_TAB we see that we have 3 JOIN_TABs in our example instead of the 2 objects in the older posts. So let's look into this situation and describe the results.

following the code

Before I jump into the code let me show you some numbers, they show what we have to expect.

some numbers

Let's split the SQL-statement into parts and look at the numbers these partial statements deliver:

MariaDB [TestOpt]> select A.PZN from TestSmall A where A.Hersteller = '00020';
+---------+
| PZN     |
+---------+
| 178324  |
| 3717968 |
| 3717974 |
..............
| 8425779 |
| 8425785 |
| 8494562 |
| 8494579 |
+---------+
60 rows in set (0.37 sec)

MariaDB [TestOpt]>

There are 60 records in TestSmall that match the WHERE-condition, so the inner SQL-statement (=the subselect) will deliver 60 records.

MariaDB [TestOpt]> select b.pzn from TestBig b where b.hersteller = '36367';
+--------+
| pzn    |
+--------+
| 535818 |
| 999999 |
| 999999 |
..............
| 19057  |
| 999999 |
| 999999 |
| 999999 |
| 999999 |
| 999999 |
| 999999 |
| 999999 |
+--------+
461016 rows in set (34.24 sec)

MariaDB [TestOpt]>

More then 400,000 records from TestBig will match the WHERE-condition and have to be matched for the IN-condition. A bit of work, this will take some time.

Let's look at how the server handles this statement.

overview

The function-hierarchy for the execution of this statement looks like this (simplified):

JOIN::exec()
    JOIN::exec_inner()
        do_select()
            sub_select()                                  level 1
                evaluate_join_record()
                    sub_select()                          level 2
                        join_tab_execution_startup()
                            sub_select()                  level 3
                                evaluate_join_record()

The server starts executing this statement by doing a table-scan on the table TestBig. When the first record matching the condition is found, it starts with the handling of a temp-file. In this step it reads the table TestSmall and putts the PZNs of the matching records into it. When this is done it checks if the PZN-value of a record from TestBig can be found in the temp-file and when found acts accordingly.

That's all.

The function sub_select() will appear multiple times in my text here so to avoid confusion let me introduce some conventions: I will write of level 1 when sub_select() handles the table TestBig, of level 2 when the temp-table is handled and of level 3 when the table TestSmall is handled. I hope this clears my text a bit.

jumping into the code

And now, really, I want to jump into the code. Let's start with the first call of the function sub_select().

level 1

When sub_select() is called for the first time it is given a JOIN_TAB representing the table TestBig (as expected from the explain). In this function it reads and inspects the first record from this table:

  if (rc != NESTED_LOOP_NO_MORE_ROWS)
  {
    error= (*join_tab->read_first_record)(join_tab); 
    ....
    rc= evaluate_join_record(join, join_tab, error);
  }

read_first_record() calls join_init_read_record() which calls rr_sequential() via the statement (*tab->read_record.read_record). So the first record from TestBig is read by this code. evaluate_join_record() inspects this records but in my case it didn't find a match.

So it continues with the main loop:

  while (rc == NESTED_LOOP_OK && join->return_tab >= join_tab)
  {
    .....
    error= info->read_record(info);
    ....
    rc= evaluate_join_record(join, join_tab, error);
  }

info->read_record() calls rr_sequential() which reads the next record from TestBig, evaluate_join_record() inspects the record read. When evaluate_join_record() finds a match this code is executed next:

in evaluate_join_record():
  if (select_cond)
  {
    select_cond_result= MY_TEST(select_cond->val_int());        applies the (partial) WHERE-condition
    .....
  }
  ....
    if (found)
    {
      ....
      /* A match from join_tab is found for the current partial join. */
      rc= (*join_tab->next_select)(join, join_tab+1, 0);

level 2

This last line shown calls sub_select() with a JOIN_TAB for a table named /tmp/#sql_247d_0²⁾ of type MEMORY (in older versions this type was called HEAP). So the server is now at level 2 and here is what the server does with this empty table in sub_select() in this level:

in sub_select():
  if (rc != NESTED_LOOP_NO_MORE_ROWS && 
      (rc= join_tab_execution_startup(join_tab)) < 0)
    DBUG_RETURN(rc);

The code prepares for switching to the 3rd level:

 in join_tab_execution_startup():
  .......
  else if (tab->bush_children)
  {
    /* It's a merged SJM nest */
    .....
      /*
        Now run the join for the inner tables. The first call is to run the
        join, the second one is to signal EOF (this is essential for some
        join strategies, e.g. it will make join buffering flush the records)
      */
      if ((rc= sub_select(join, join_tab, FALSE/* no EOF */)) < 0 ||
          (rc= sub_select(join, join_tab, TRUE/* now EOF */)) < 0)
      {
        ....
      }

level 3

As you can see sub_select() is called again, this time with JOIN_TAB representing the table TestSmall. It is called twice, the second time for clean-up-operations.

Let's look at what the first call of sub_select() does: similar steps as described for TestBig some lines above are done so it reads the first record from TestSmall via (*join_tab->read_first_record)() and inspects it in evaluate_join_record((). As no match is found in my case it continues with the main loop, again as described some lines above for the table TestBig. Interesting things happen again when a match is found:

in evaluate_join_record():
      /* A match from join_tab is found for the current partial join. *
      rc= (*join_tab->next_select)(join, join_tab+1, 0);

with this call in the last line a lot of things happen here:

evaluate_join_record()
    end_sj_materialize()                      via (*join_tab->next_select)()                  
        fill_record()
            Item_field::save_in_field()
                save_field_in_field()
        handler::ha_write_tmp_row()
            ha_heap::write_row()
                heap_write()

As the server is currently reading the table TestSmall and there are 60 records matching the condition so there are 60 values of PZN taken from these records and stored into this temp-table.
At the end of the table-scan of TestSmall the function sub_select() is left with a return-value of NESTED_LOOP_OK and called immediately again with a value TRUE for the parameter end_of_records, called from join_tab_execution_startup():

      if ((rc= sub_select(join, join_tab, FALSE/* no EOF */)) < 0 ||
          (rc= sub_select(join, join_tab, TRUE/* now EOF */)) < 0)

In this clean-up-operation there is not so much that happens:

join_tab_execution_startup()  
    sub_select()
        end_sj_materialize

So it returns to to the calling function soon.

back to level 2

Now we're back at sub_select(), the table TestSmall is already read and the temp-table is filled with the records from TestSmall.

a short break

OK, a short break and I want to take the time to remember what the server has done already.

We've read record after record from TestBig until we found the first record which matches the (partial) condition. We keep the PZN-value from this record, switch over to a temp-table and read the records from TestSmall, check these records and if a records matches the (partial) condition on TestSmall we put the PZN-value of this record (from TestSmall) into the temp-table.

And I've stopped at this step.

The next step will be to continue reading and inspecting the records from TestBig. If we found a matching record we will look for the PZN-value in the temp-table. As we already have a matching record read from TestBig so the server will immediately start with the lookup in the temp-table.

So we continue our journey at level 2.

continuing at level 2

Now we try to find the PZN-value (from the record from TestBig) in our temp-table.

sub_select()
    join_read_key()               via (*join_tab->read_first_record)(join_tab);
        join_read_key2()
            handler::ha_index_read_map()
                ha_heap::index_read_map()

In my case it did not find this value in the index so it returns the value HA_ERR_KEY_NOT_FOUND (converted to -1). Next evaluate_join_record() is called which checks the error-code and returns NESTED_LOOP_NO_MORE_ROWS. For this reason sub_select() is left and we're back in evaluate_join_record() of level 1.

back to level 1

So in sub_select() we're in the main-loop again, reading and evaluating record after record from TestBig (already described some lines above).

In evaluate_join_record() the record read is inspected and if it does not match the condition it returns. This cycle, reading and inspecting, continues until a match is found. In this case:

sub_select()                                   level 1
    evaluate_join_record()                     calls the next level via (*join_tab->next_select)(join, join_tab+1, 0);
        sub_select                             level 2   
            join_read_key()                    by reading the first record, via (*join_tab->read_first_record)(join_tab);
                join_read_key2()
                    handler::ha_index_read_map()
                        ha_heap::index_read_map()

so the server switches to level 2 and at this level in sub_select() it reaches the code for reading a first record from the temp-table using the PZN-value of the current record from TestBig as a key to look for (in the temp-table) in JOIN_TAB->ref->key_buff.

So now the steps are: read records from TestBig (in level 1). When a match is found in evaluate_join_record() the server switches to level 2 and look if the PZN-value can be found in the temp-table. These steps continues until all records from TestBig are read and treated.

I hope you did follow this text until here, it was a bit of work understanding and describing these steps.

correctness

This text is a simplified presentation of what's going on in the software. As I'm still learning the inner-workings of this software errors on my side can happen. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.

some notes:

1) on MySQL explain also delivers 3 JOIN_TABs, but they look differently

2) did I mention that this may look differently on your machine?

Tuesday 1 November 2016

subselect: execution

warning

environment

the SQL-statement

environment

explain

following the code

some numbers

overview

jumping into the code

level 1

level 2

level 3

back to level 2

a short break

continuing at level 2

back to level 1

correctness