In my last texts I started with a SQL-statement and looked where and how this is handled inside the server. You can find my results of the handling of this statement in the optimizer-stage here: optimizer: no indexes. This optimizer-stage is followed by the execution-stage and you can find my results here: execution: with and without indexes. This last text describes how the server reads the records from the tables but the text omits the JOIN between the records of these two tables. And in this text I want to deliver this part.
In my last post I had two cases which I compared: I started without any indexes on the two tables involved in the SQl-statement, later I added one (usable) index on one of the tables. For this text I want to look only at the case without any index defined.
warning
Before I dive into the code let me to issue a warning: the functions, hierarchies and data-examples shown here are valid for the environment on my PC. If you do your own tests this may look a bit different.environment
Again I'm using my usual environment which I already described in my last post. It's the same SQL-statement, the same tables, the same data in the tables etc.. Only my focus changes.this text
And the focus of this text is the JOIN. Where is the code that really puts the records from the two tables together? And how is this done?So let's look into the code.
a reminder
Before I start let me show you the statement I'm inspecting including the explain:MariaDB [TestOpt]> select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B IGNORE INDEX (PZN) on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
| 14 |
+--------------+
1 row in set (23.03 sec)
MariaDB [TestOpt]> explain select SQL_NO_CACHE count(A.PZN) from TestSmall A join TestBig B IGNORE INDEX (PZN) on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: A
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 301036
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: B
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 10000000
Extra: Using where; Using join buffer (flat, BNL join)
2 rows in set (0.01 sec)
MariaDB [TestOpt]>
Please look at the last line of the explain: this line describes what I'm looking at in this text.execution
Let me describe the execution of this statement in some form of pseudocode:In executing this explain the first step is reading the table TestSmall in a table-scan. Each record fetched is inspected, which means applying the (partial) WHERE-clause (aka comparing the value of the column Hersteller against the string-constant '00020'). If this record fulfils the condition it is stored in an internal buffer and the scan continues. In the end of this step the buffer contains the relevant records from TestSmall for the next step.
Next the table TestBig is read by a table-scan. On each record fetched the (partial) WHERE-clause is applied (aka comparing the value of the column Hersteller against the string-constant '36367'). If this is a valid record the JOIN is done with the records already in the buffer (the records from TestSmall). After this operation the table-scan continues until the end.
1st step: reading TestSmall and putting records into the buffer
So the steps of interest for this part of the text are:- where is the information put into the cache ?
- which information is put into the cache ?
- where is the memory for the cache allocated ?
put record in cache
If a record is found which fulfils the (partial) WHERE-clause (akaHere is the function hierarchy for putting the data into the cache:
do_select()
sub_select()
evaluate_join_record()
sub_select_cache()
JOIN_CACHE::put_record()
The records are fetched in sub_select() and inspected in evaluate_join_record(). If a record fulfils the WHERE-clause the function sub_select_cache() is called which calls put_record() to put the information from this record into the cache.Let me go back a bit: in handling any query each table involved is internally represented by an object of type JOIN_TAB, so for this query two of these exist: one for table TestSmall and one for table TestBig.
Back to the hierarchy: in the function evaluate_join_record() the record read is inspected. If it fulfils the WEHERE-clause the next level is called by this statement:
And this is the way the code accesses the cache involved:
typedef struct st_join_table {
.....
JOIN_CACHE *cache ;
.....
} JOIN_TAB;
class JOIN_CACHE :public Sql_alloc
{
....
/* Pointer to the beginning of the join buffer */
uchar *buff ;
....
};
And the data from TestSmall is put in a cache that is associated to the JOIN_TAB representing the table TestBig. And buff is the pointer to the memory-area used for this.
the information in the cache
So put_record() writes the data into the memory-area pointed to by buff. It doesn't do this directly but calls the function write_record_data() to do this. This data is written into the buffer1):- a flag: 0x80
- length of next entry: 0x0006
- the contents of the column PZN: '178324'
- length of the next entry: 0x0005
- the contents of the column Hersteller: '00020'
Here is the contents of the first entries in the buffer (all entries are in hex, only the first 3 of 60 entries are shown):
flag | length | PZN | length | Hersteller |
---|---|---|---|---|
0x80 | 0x06 0x00 | 0x31 0x37 0x38 0x33 0x32 0x34 | 0x05 0x00 | 0x30 0x30 0x30 0x32 0x30 |
0x80 | 0x07 0x00 | 0x33 0x37 0x31 0x37 0x39 0x36 0x38 | 0x05 0x00 | 0x30 0x30 0x30 0x32 0x30 |
0x80 | 0x07 0x00 | 0x33 0x37 0x31 0x37 0x39 0x37 0x34 | 0x05 0x00 | 0x30 0x30 0x30 0x32 0x30 |
allocating memory for the cache
Now some data is put into a buffer but where does this memory-area come from? Somewhere this area must be allocated, and this is the hierarchy:mysql_select()
JOIN::optimize()
JOIN::optimize_inner()
make_join_readinfo()
check_join_cache_usage_for_tables()
check_join_cache_usage()
JOIN_CACHE_BNL::init()
JOIN_CACHE::init()
JOIN_CACHE::alloc_buffer()
JOIN_CACHE::get_max_join_buffer_size()
The buffer is allocated in the optimizer-stage. The function alloc_buffer() reserves a memory-area of size 128K of bytes for this and stores the pointer to this cache in the var. buff as shown some lines above. The value 128K is returned from get_max_join_buffer_size() which gets this value from the expression
This finishes the first step: we have the buffer, all relevant data is put into it and TestSmall is read until the end. Now we switch over and start reading the records from TestBig.
reading TestBig and JOINing the records with the cache
Next the table TestBig is read in a table-scan and the WHERE-condition is applied to each record fetched. If the record read fulfils the condition (aka Hersteller = '36367') a record with the same value in the column PZN from TestSmall is needed for a successful JOIN. As the records from TestSmall are already in the cache this cache is searched. This is done in this code inside the function JOIN_CACHE::join_matching_records()2): while (!(error= join_tab_scan ->next())) outer loop
{
......
/* /* Read each possible candidate from the buffer and look for matches */ /*
while ((rec_ptr= get_next_candidate_for_match())) inner loop
{
/*
If only the first match is needed, and, it has been already found for
the next record read from the join buffer, then the record is skipped.
Also those records that must be null complemented are not considered
as candidates for matches.
*/
.........
}
}
OK, this doesn't explain too much so let's look into the details. Everything starts with reading the records from the table TestBig. For each record fetched the (partial) WHERE-condition is applied and if this condition is fulfilled we have a candidate for a JOIN with a record from TestSmall. This has been done within the function next(), in the box above I marked this with the words outer loop.
Of this record from TestBig the value of PZN is extracted and the list of records from TestSmall in the cache is searched for a match. The steps are:
- init the search through the cache
- get one entry from the cache
- compare the value of PZN with the value of PZN of the record from TestBig
- if they match: we found a record from TestSmall and a record from TestBig that do JOIN according to our SQL-statement
- continue with step 2 until all entries in the cache are checked
Here is the function hierarchy:
JOIN_CACHE::join_matching_records()
outer loop:
JOIN_TAB_SCAN::next()
JOIN_CACHE_BNL::prepare_look_for_matches()
JOIN_CACHE::reset()
inner loop:
JOIN_CACHE_BNL::get_next_candidate_for_match()
JOIN_CACHE_BNL::read_next_candidate_for_match(
JOIN_CACHE::get_record()
JOIN_CACHE::read_all_record_fields()
JOIN_CACHE::read_flag_fields()
JOIN_CACHE::read_record_field()
JOIN_CACHE::generate_full_extensions()
JOIN_CACHE::check_match()
skip_record()
end_send_group() in case of a match, called via: join_tab->next_select)(join, join_tab+1, 0);
end inner loop
end outer loop
In the function skip_record() the ON-clause of my SQL-statement is executed. This is the hierarchy for the comparison of the PZN-values:
Item_func_eq::val_int()
Arg_comparator::compare()
Arg_comparator::compare_string()
Item_field::val_str()
Item_field::val_str()
sortcmp()
And that's the whole process for JOINing the records.
Additionally I want to show you some numbers I found by working with this statement.
some numbers
I've added some variables to count the number of iterations in the inner loop and the outer loop described. These are the results in my test:counterOuterLoop: 461,016
counterInnerLoop: 27,660,960
Let's verify these numbers with some SQL-statements.
The number of records from TestSmall to be checked in the JOIN:
cMariaDB [TestOpt]> select SQL_NO_CACHE count(A.PZN) from TestSmall A where A.Hersteller = '00020';
+--------------+
| count(A.PZN) |
+--------------+
| 60 |
+--------------+
1 row in set (0.44 sec)
MariaDB [TestOpt]>
The number of records from TestBig to be checked in the JOIN:
cMariaDB [TestOpt]> select SQL_NO_CACHE count(B.PZN) from TestBig B where B.Hersteller = '36367';
+--------------+
| count(B.PZN) |
+--------------+
| 461016 |
+--------------+
1 row in set (12.78 sec)
MariaDB [TestOpt]>
The first number verifies the number given above in the description of the first step. You will find the second number as the value of the variable counterOuterLoop. And the value of the variable counterInnerLoop is simply the product of these 2 number, meaning the server fetches 10 mio. records from TestBig, found 461,016 matching records and iterates through the inner loop about 27.6 million times in the handling of my SQL-statement.
And that ends my journey for today.
correctness
This text is a simplified presentation of what's going on in the software. As I'm still learning the inner-workings of this software errors can happen. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.some notes:
1) please do not forget that this is the data found in my environment.
2) for the details of reading records from TestBig, the hierarchy of the code for doing this and some specialities omitted here please look at my last post.