Wednesday, 5 November 2014

JOIN (2)

here I'm presentiung some more information about how MySQL/MariaDB handles a JOIN.

In my post JOIN I described what the storage-engine does in executing a simple (but non-trivial) SQL-statement. Here I take the same statement and describe what the database-server does. So in my last post my description started when the server calls a function in the engine, in this post my description will end when such a function is called. The last post described function-hierarchies in the MyISAM-engine, this post will describe hierarchies above the engine-level and therefore valid for all storage-engines.

environment

The environment is the same as in the post JOIN. The are two tables: TestSmall with approx. 300K records in it and TestBig with 10mio. records in it. The structure of the tables is the same as before. And on both tables there is an index on the column PZN.

statement

And here is the statement I want to inspect:
MariaDB [TestOpt]> select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0.48 sec)

MariaDB [TestOpt]> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ALL
possible_keys: PZN
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 301036
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ref
possible_keys: PZN
          key: PZN
      key_len: 8
          ref: TestOpt.A.PZN
         rows: 448
        Extra: Using where
2 rows in set (0.00 sec)

MariaDB [TestOpt]> 
Also included is the EXPLAIN for this SQL-statement.

pseudocode

Here is what the server does (in pseudocode): the server scans the table TestSmall and fetches record after record (table-scan). For each record the WHERE-clause will be applied and if a match is found the server extracts from the current record the value of the column PZN, switches over to the table TestBig and searches all records with this value. For this operation it uses the index on the column PZN of TestBig. Again for each record found in TestBig the WHERE-clause is applied, in the case of a match this record is handled.
This process is continued in TestBig until all records with this PZN-value are examined. It then switches back to the table TestSmall and continues scanning through this table.
The whole process stops when all records in TestSmall are read.

hierarchy

Instead of long explanations here is the function-hierarchy for executing this statement:
 JOIN::exec()
     JOIN::exec_inner()
         do_select()
             sub_select()
                 // access the first record:
                 (*join_tab->read_first_record)() -> join_init_read_record()
                     init_read_record()
                         table->file->ha_rnd_init_with_error() -> handler::ha_rnd_init_with_error()
                             ha_rnd_init() -> handler::ha_rnd_init(
                                 rnd_init() -> ha_myisam::rnd_init
                         (*tab->read_record.read_record)() -> rr_sequential()
                              info->table->file->ha_rnd_next() -> handler::ha_rnd_next()
                                  rnd_next() -> ha_myisam::rnd_next()
                 evaluate_join_record()

                 // iterate: read all records until match
                info->read_record() -> rr_sequential()
                    info->table->file->ha_rnd_next() -> handler::ha_rnd_next()
                        rnd_next() -> ha_myisam::rnd_next()
                evaluate_join_record()
                // until END_OF_FILE

The whole work is done in the function sub_select(). The function is entered with 2 objects plus a boolean, let's look at the objects: an object of class JOIN and an object of struct JOIN_TAB. The class JOIN contains information about all tables to be accessed in executing this query, the struct JOIN_TAB contains information about one table, the table that is inspected by this call of sub_select().

If you want to look into the data of these parameters please add the following lines in the code of the function sub_select() (found in the file sql/sql_select.cc) after entrance into this function:
  TABLE ** ptrTables = join->table;
  char *ptrTable ;
  fprintf(stderr, "entering sub_select:\n\tstructure JOIN:\n");
  while ( *ptrTables != NULL )
  {
      ptrTable = (*ptrTables)->s->table_name.str;
      fprintf(stderr, "\t\ttable: <%s>\n", ptrTable);
      ++ptrTables;
  }
  char *ptr2ndTable = join_tab->table->s->table_name.str;
  fprintf(stderr, "\tstructure JOIN_TAB:\n\t\ttable: <%s>\n", ptr2ndTable);

And here is the output found in the log-file (on my machine it's in /var/log/mysql/error.log):
entering sub_select: 
    structure JOIN:
        table: <TestSmall>
        table: <TestBig>
    structure JOIN_TAB:
        table: <TestSmall>
As you can see the JOIN contains information about the tables TestSmall and TestBig (in this order), JOIN_TAB contains information about TestSmall and we will start with this table.

Coming back to the function-call-hierarchy given above let's break it into pieces and look at it. As reading the first record may need an extra step for initialization this access is handled separately:
in sub_select():
                 (*join_tab->read_first_record)() -> join_init_read_record()
                     init_read_record()
                         table->file->ha_rnd_init_with_error() -> handler::ha_rnd_init_with_error()
                             ha_rnd_init() -> handler::ha_rnd_init()
                                 rnd_init() -> ha_myisam::rnd_init()
                         (*tab->read_record.read_record)() -> rr_sequential()
                              info->table->file->ha_rnd_next() -> handler::ha_rnd_next()
                                  rnd_next() -> ha_myisam::rnd_next()
                 evaluate_join_record()
The last function in this hierarchy is evaluate_join_record(). In this function the record read is examined (= the WHERE-clause will be applied).

After the first record is read and processed all following records are read by this loop (again I show you the hierarchy of function-calls, not the full code executed):
in sub_select():
                // iterate: read all records until match
                info->read_record() -> rr_sequential()
                    info->table->file->ha_rnd_next() -> handler::ha_rnd_next()
                        rnd_next() -> ha_myisam::rnd_next()
                evaluate_join_record()
                // until END_OF_FILE
That's all.

But I've only shown how the table TestSmall is accessed. So where does the code jump over and accesses TestBig? This happens in the function evaluate_join_record(). Here the WHERE-clause will be applied to the current record (from TestSmall) and in case of a match the function sub_select() is called (a recursion). This is the line of code in evaluate_join_record() that calls sub_select():
      /* A match from join_tab is found for the current partial join. */
      rc= (*join_tab->next_select)(join, join_tab+1, 0);

On entering the function sub_select() again the parameters now look like:
entering sub_select: 
    structure JOIN:
        table: <TestSmall>
        table: <TestBig>
    structure JOIN_TAB:
        table: <TestBig>
You see that the data in JOIN is identical to the first entry but the value of the JOIN_TAB now points to data regarding TestBig.

As described above the code in sub_select() checks for the first record, now in the table TestBig:
in sub_select():
                        // search for the first record:
                        (*join_tab->read_first_record)() -> join_read_always_key()
                             table->file->ha_index_init() -> handler::ha_index_init()
                                 index_init() -> ha_myisam::index_init()
                             table->file->ha_index_read_map() -> handler::ha_index_read_map()
                                 index_read_map() -> ha_myisam::index_read_map()
                         evaluate_join_record()
Seems like an identical hierarchy but the functions called differ because of a different JOIN_TAB.

If there are more records in TestBig these are fetched by these function-calls (still in sub_select()):
in sub_select():
                         // iterate: search all records until a no-match
                         info->read_record() -> join_read_next_same()
                             table->file->ha_index_next_same() -> handler::ha_index_next_same()
                                 index_next_same() -> ha_myisam::index_next_same()
                         evaluate_join_record()
                         // until NESTED_LOOP_NO_MORE_ROWS

And this finishes our journey through the handling of the given SQL-statement.


correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.