In my post JOIN I described what the storage-engine does in executing a simple (but non-trivial) SQL-statement. Here I take the same statement and describe what the database-server does. So in my last post my description started when the server calls a function in the engine, in this post my description will end when such a function is called. The last post described function-hierarchies in the MyISAM-engine, this post will describe hierarchies above the engine-level and therefore valid for all storage-engines.
environment
The environment is the same as in the post JOIN. The are two tables: TestSmall with approx. 300K records in it and TestBig with 10mio. records in it. The structure of the tables is the same as before. And on both tables there is an index on the column PZN.statement
And here is the statement I want to inspect:MariaDB [TestOpt]> select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
| 14 |
+--------------+
1 row in set (0.48 sec)
MariaDB [TestOpt]> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: A
type: ALL
possible_keys: PZN
key: NULL
key_len: NULL
ref: NULL
rows: 301036
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: B
type: ref
possible_keys: PZN
key: PZN
key_len: 8
ref: TestOpt.A.PZN
rows: 448
Extra: Using where
2 rows in set (0.00 sec)
MariaDB [TestOpt]>
Also included is the EXPLAIN for this SQL-statement.pseudocode
Here is what the server does (in pseudocode): the server scans the table TestSmall and fetches record after record (table-scan). For each record the WHERE-clause will be applied and if a match is found the server extracts from the current record the value of the column PZN, switches over to the table TestBig and searches all records with this value. For this operation it uses the index on the column PZN of TestBig. Again for each record found in TestBig the WHERE-clause is applied, in the case of a match this record is handled.This process is continued in TestBig until all records with this PZN-value are examined. It then switches back to the table TestSmall and continues scanning through this table.
The whole process stops when all records in TestSmall are read.
hierarchy
Instead of long explanations here is the function-hierarchy for executing this statement: JOIN::exec()
JOIN::exec_inner()
do_select()
sub_select()
// access the first record:
(*join_tab->read_first_record)() -> join_init_read_record()
init_read_record()
table->file->ha_rnd_init_with_error() -> handler::ha_rnd_init_with_error()
ha_rnd_init() -> handler::ha_rnd_init(
rnd_init() -> ha_myisam::rnd_init
(*tab->read_record.read_record)() -> rr_sequential()
info->table->file->ha_rnd_next() -> handler::ha_rnd_next()
rnd_next() -> ha_myisam::rnd_next()
evaluate_join_record()
// iterate: read all records until match
info->read_record() -> rr_sequential()
info->table->file->ha_rnd_next() -> handler::ha_rnd_next()
rnd_next() -> ha_myisam::rnd_next()
evaluate_join_record()
// until END_OF_FILE
The whole work is done in the function sub_select(). The function is entered with 2 objects plus a boolean, let's look at the objects: an object of class JOIN and an object of struct JOIN_TAB. The class JOIN contains information about all tables to be accessed in executing this query, the struct JOIN_TAB contains information about one table, the table that is inspected by this call of sub_select().
If you want to look into the data of these parameters please add the following lines in the code of the function sub_select() (found in the file sql/sql_select.cc) after entrance into this function:
TABLE ** ptrTables = join->table;
char *ptrTable ;
fprintf(stderr, "entering sub_select:\n\tstructure JOIN:\n");
while ( *ptrTables != NULL )
{
ptrTable = (*ptrTables)->s->table_name.str;
fprintf(stderr, "\t\ttable: <%s>\n", ptrTable);
++ptrTables;
}
char *ptr2ndTable = join_tab->table->s->table_name.str;
fprintf(stderr, "\tstructure JOIN_TAB:\n\t\ttable: <%s>\n", ptr2ndTable);
And here is the output found in the log-file (on my machine it's in /var/log/mysql/error.log):
entering sub_select:
structure JOIN:
table: <TestSmall>
table: <TestBig>
structure JOIN_TAB:
table: <TestSmall>
As you can see the JOIN contains information about the tables TestSmall and TestBig (in this order), JOIN_TAB contains information about TestSmall and we will start with this table.Coming back to the function-call-hierarchy given above let's break it into pieces and look at it. As reading the first record may need an extra step for initialization this access is handled separately:
in sub_select():
(*join_tab->read_first_record)() -> join_init_read_record()
init_read_record()
table->file->ha_rnd_init_with_error() -> handler::ha_rnd_init_with_error()
ha_rnd_init() -> handler::ha_rnd_init()
rnd_init() -> ha_myisam::rnd_init()
(*tab->read_record.read_record)() -> rr_sequential()
info->table->file->ha_rnd_next() -> handler::ha_rnd_next()
rnd_next() -> ha_myisam::rnd_next()
evaluate_join_record()
The last function in this hierarchy is evaluate_join_record(). In this function the record read is examined (= the WHERE-clause will be applied).After the first record is read and processed all following records are read by this loop (again I show you the hierarchy of function-calls, not the full code executed):
in sub_select():
// iterate: read all records until match
info->read_record() -> rr_sequential()
info->table->file->ha_rnd_next() -> handler::ha_rnd_next()
rnd_next() -> ha_myisam::rnd_next()
evaluate_join_record()
// until END_OF_FILE
That's all.But I've only shown how the table TestSmall is accessed. So where does the code jump over and accesses TestBig? This happens in the function evaluate_join_record(). Here the WHERE-clause will be applied to the current record (from TestSmall) and in case of a match the function sub_select() is called (a recursion). This is the line of code in evaluate_join_record() that calls sub_select():
/* A match from join_tab is found for the current partial join. */
rc= (*join_tab->next_select)(join, join_tab+1, 0);
On entering the function sub_select() again the parameters now look like:
entering sub_select:
structure JOIN:
table: <TestSmall>
table: <TestBig>
structure JOIN_TAB:
table: <TestBig>
You see that the data in JOIN is identical to the first entry but the value of the JOIN_TAB now points to data regarding TestBig.As described above the code in sub_select() checks for the first record, now in the table TestBig:
in sub_select():
// search for the first record:
(*join_tab->read_first_record)() -> join_read_always_key()
table->file->ha_index_init() -> handler::ha_index_init()
index_init() -> ha_myisam::index_init()
table->file->ha_index_read_map() -> handler::ha_index_read_map()
index_read_map() -> ha_myisam::index_read_map()
evaluate_join_record()
Seems like an identical hierarchy but the functions called differ because of a different JOIN_TAB.If there are more records in TestBig these are fetched by these function-calls (still in sub_select()):
in sub_select():
// iterate: search all records until a no-match
info->read_record() -> join_read_next_same()
table->file->ha_index_next_same() -> handler::ha_index_next_same()
index_next_same() -> ha_myisam::index_next_same()
evaluate_join_record()
// until NESTED_LOOP_NO_MORE_ROWS
And this finishes our journey through the handling of the given SQL-statement.