Wednesday 8 June 2016

the real JOIN: the case with an index


This is another text where I want to describe how the server handles certain statements. In my older texts I described some parts of the action inside the server, the focus of my last texts were the handling of a JOIN-operation. In my text the real JOIN: the case without indexes I described the unusual case of JOINing 2 tables without any index that could help. For this text I want to show the usual case: doing the JOIN with a useful index.

When an SQL-statement is sent to the database-server a lot of things happen: among other things the statement is checked for syntax and access rights, a plan for execution has to be searched for and the pan found is executed. This text focuses on the execution-step and especially on the way the JOIN is done.


warning

Before I dive into the code let me to issue a warning: the functions, hierarchies and data-examples shown here are valid for the environment on my PC. If you do your own tests this may look a bit different.


environment

Again I'm using my usual environment which I already described in my last post. It's the same SQL-statement, the same tables, the same data in the tables etc.. Only my focus changes.


the statement

To start I want to show the statement used for this text:
MariaDB [TestOpt]> select SQL_NO_CACHE  count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0.38 sec)

MariaDB [TestOpt]> explain select SQL_NO_CACHE  count(A.PZN) from TestSmall A join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 301036
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: PZN
          key: PZN
      key_len: 8
          ref: TestOpt.A.PZN
         rows: 448
        Extra: Using where
2 rows in set (0.00 sec)

MariaDB [TestOpt]> 
As you can see I've already included the explain.


explaining the explain

The execution of this statement is really fast compared to the time for answering the request in the case without any index (= 23 sec.). Let me describe in some words what the lines above mean.

The execution starts with reading the table TestSmall (aka A) in a table-scan. On every record read the WHERE-clause is applied, at least parts of the WHERE. When the record fulfills the condition the server extracts the value of the column PZN from this record, switches to the table TestBig (aka B) and reads all records from TestBig using the index PZN for the given PZN-value. On every record read from TestBig the (partial) WHERE-condition is applied. If a match is found this record from TestBig together with the record from TestSmall are matches for the query. When all records for the given PZN-value are read from TestBig the server returns to the table TestSmall and continues reading the next record, applying the WHERE etc. until the end of TestSmall is reached.

Let's look into the code for the first part, the table-scan on the table TestSmall. This is the hierarchy:
mysql_select()
    JOIN::exec()
        JOIN::exec_inner()
            do_select()
                sub_select()

Let's stay here and look for the action. The function sub_select() is called with a pointer to a structure of type JOIN_TAB as a parameter, in this JOIN_TAB you will find the routines to call for reading a record. This is the code in sub_select() that handles reading the first record from the table:
  if (rc != NESTED_LOOP_NO_MORE_ROWS)
  {
    error= (*join_tab->read_first_record)(join_tab);
    ....
    rc= evaluate_join_record(join, join_tab, error);
  }

some lines later this code handles all following read-operations:
  while (rc == NESTED_LOOP_OK && join->return_tab >= join_tab)
  {
    ....
    error= info->read_record(info);
    ....
    rc= evaluate_join_record(join, join_tab, error);
  }

In the JOIN_TAB-structure the variables read_first_record and read_record are set before entering this level.1). So here is an excerpt of the structure JOIN_TAB:
typedef struct st_join_table {
  ...
  READ_RECORD::Setup_func read_first_record;
  Next_select_func next_select;
  READ_RECORD read_record;
  ...
} JOIN_TAB;

After a record is read the function evaluate_join_record() is called. This function applies the partial WHERE-condition to this record and if it does not fulfill the condition the code returns to the calling function and there it continues by reading the net record from the table TestSmall. If the record from TestSmall fulfills the (partial) WHERE-condition it calls another function:
static enum_nested_loop_state
evaluate_join_record(JOIN *join, JOIN_TAB *join_tab,
                     int error)
{
    ....
  if (select_cond)
  {
    select_cond_result= MY_TEST(select_cond->val_int());      the test: does it fulfill the partial WHERE-condition ?

    /* check for errors evaluating the condition */
    ....
  }

    if (found)
    {
      ....
      /* A match from join_tab is found for the current partial join. */
      rc= (*join_tab->next_select)(join, join_tab+1, 0);  in this call use the pointer to the next JOIN_TAB
    ....
}

When a match is found in the test shown above the next level is called via JOIN_TAB->next_select() which in my case here is the function sub_select() again. So the hierarchy is now:
sub_select()
    evaluate_join_record()
        sub_select()
On the 2nd level one of the parameters in the call to sub_select() is a pointer to the JOIN_TAB representing the table TestBig (in the function-call I marked this in bold, it now points to the next JOIN_TAB in the list).

The same code in sub_select() is executed so we have the code for the first access and all following accesses to the table. As this is now the table TestBig the vars. JOIN_TAB.read_first_record and JOIN_TAB.read_record point to the functions for accessing the table via the index PZN. When these routines return NESTED_LOOP_NO_MORE_ROWS the code in sub_select() terminates and returns to evaluate_join_record() which itself returns to sub_select() and there the table-scan on the table TestSmall continues till the end of the file (again signalled by NESTED_LOOP_NO_MORE_ROWS).


remarks

That's it. If you compare this to the action described in my last post you will notice a difference in the amount of work which results in the time needed in handling the same query.


correctness

This text is a simplified presentation of what's going on in the software. As I'm still learning the inner-workings of this software errors on my side can happen. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.




some notes:
1) you will find a description of the function-hierarchies for accessing a table here and here and other texts on this site.