Friday 14 November 2014

JOIN (3)

this is the third post with my observations when I inspect a simple SQL-statement. You can find the other posts here: JOIN and JOIN (2).
And this is the SQL-statement I'm looking at:
MariaDB [TestOpt]> select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367';
+--------------+
| count(A.PZN) |
+--------------+
|           14 |
+--------------+
1 row in set (0.48 sec)

MariaDB [TestOpt]> 

The first post described what is happening in the storage-engine (in my case this is of type MyISAM), the second post described what happened at the level(s) above. And in this post I will narrow my focus and look at a very tiny point: applying the WHERE-clause.

This is a specialized description, it's only valid for the SQL-statement given above. Don't generalize this description, but it should give you a point to start with your own examinations.

applying WHERE

As I've written in my last post you will find the code for handling this SQL-statement in the function sub_select() (in sql/sql_select.cc). The WHERE-clause is inspected in the function evaluate_join_record() (called from sub_select()). The whole WHERE-clause is handled by this single line in this function:

    select_cond_result= MY_TEST(select_cond->val_int());


As written before the execution of this query starts with the table TestSmall, reading record by record and applying the WHERE-clause for each record. When a match is found it switches over to the table TestBig and searches the corresponding records in this table. For each record found in TestBig the WHERE-clause is applied too.

So let's look at this code-line.

MY_TEST is simply a macro defined in my_global.h:

#define MY_TEST(a) ((a) ? 1 : 0)

it converts the result into a boolean-type.

select_cond is a pointer to an object of type COND, so let's look at this.

code

If we want to know what this COND contains I suggest to modify the code of MariaDB a bit (I did this in MariaDB but you can do this in MySQL too, it should be a similar task). You have to add these functions to the code in ha_myisam.cc:
  • display_functype()
  • ha_myisam::cond_push()
  • handle_Item()
These functions are described here: WHERE. Please do not forget to modify the header-file accordingly, and add the declaration of the 2 internal functions in the top of the cc-file.
Additionally I modified the function evaluate_join_record() (located in sql/sql_select.cc) and added the following 3 lines (marked in bold):
  DBUG_ENTER("evaluate_join_record");
  fprintf(stderr, "evaluate_join_record() for table: %s\n", join_tab->table->s->table_name.str);
  join_tab->table->file->cond_push( select_cond) ;
  fprintf(stderr, "end of COND.\n"); 
  DBUG_PRINT("enter", .............
The DBUG-statements should show you where to add these lines. The action happens in the call of the function cond_push() which writes the WHERE-clause to the file /var/log/mysql/error.log.
So please add these lines and functions, compile everything and start the server. We will now look what's happening when we execute the SQL-statement given above.

TestSmall

Let's inspect the first record from the table TestSmall. This is what this record looks like:
MariaDB [TestOpt]> select *  from TestSmall limit 1;
+---------+---------+------+------+--------------------------+---------------------+------------+
| Id      | PZN     | EVP  | HAP  | ArtikelBez               | ArtikelText         | Hersteller |
+---------+---------+------+------+--------------------------+---------------------+------------+
| 1000001 | 4972059 | 9.70 | 4.32 | PASCOSEDON TROPFEN 20 ML | PASCOSEDON TROPFEN  | 36350      |
+---------+---------+------+------+--------------------------+---------------------+------------+
1 row in set (0.00 sec)

MariaDB [TestOpt]> 

When the SQL-statement from the entry of this post is executed the function evaluate_join_record() is called and in the case of the first record evaluated the code added produces this output::
table_name = <TestSmall>    <select SQL_NO_CACHE count(*) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'>
COND-ITEM   args: 0 type=[COND_AND_FUNC]    
FUNC-ITEM   [=] args: 2 type=[EQ_FUNC]  
FIELD-ITEM  [TestOpt] [A] [Hersteller]  str_value=<36350> name=<Hersteller>
STRING-ITEM     str_value=<00020>   name=<00020>
FUNC-ITEM   [isnotnull] args: 1 type=[ISNOTNULL_FUNC]   
FIELD-ITEM  [TestOpt] [A] [PZN] name=<PZN>
end of COND.

Maybe this form is more intuitv for you:

As you can see the WHERE-condition was modified by the server:
  • it contains only the part relevant for checking the table TestSmall
  • the check for PZN IS NOT NULL was added
Let's look into the function-hierarchy of the evaluation of this record:
TestSmall: check record with Id = 1000001 (compare: 36350 and  00020)
evaluate_join_record(): select_cond_result= MY_TEST(select_cond->val_int());
    Item_cond_and::val_int()
        Item::val_bool()
            Item_func_eq::val_int()
                Arg_comparator::compare()
                    Arg_comparator::compare_string()
                        Item_field::val_str()
                        Item_string::val_str()
                        sortcmp()
                            my_strnncollsp_simple()
The last function returns a non-zero-value which says that the two arguments compared are not equal so the whole expression is false (at the top-level there is an AND-operation. When one of the arguments is false then the whole AND results in a false and therefore the right branch of the tree is not evaluated).

With this result the record is thrown away and the next record from TestSmall is read. This process will be repeated until this record is read:
MariaDB [TestOpt]> select * from TestSmall where Hersteller = '00020' limit 1;
+---------+--------+------+------+--------------------------------+----------------------------+------------+
| Id      | PZN    | EVP  | HAP  | ArtikelBez                     | ArtikelText                | Hersteller |
+---------+--------+------+------+--------------------------------+----------------------------+------------+
| 1002100 | 178324 | 3.95 | 1.83 | TINY TOON ADVENTURES KIND 1 ST | TINY TOON ADVENTURES KIND  | 00020      |
+---------+--------+------+------+--------------------------------+----------------------------+------------+
1 row in set (0.01 sec)

MariaDB [TestOpt]>
When the WHERE-clause is applied to this record this is the function-hierarchy in this case:
TestSmall: check the record with Id = 1002100
evaluate_join_record(): select_cond_result= MY_TEST(select_cond->val_int());
    Item_cond_and::val_int()
        Item::val_bool()
            Item_func_eq::val_int()
                Arg_comparator::compare()
                    Arg_comparator::compare_string()
                        Item_field::val_str()
                        Item_string::val_str()
                        sortcmp()
                            my_strnncollsp_simple()
            Item::val_bool()
                Item_func_isnotnull::val_int()
                    Item_field::is_null()
                        Field::is_null()
As you can see the right branch of the tree is also checked for this record as the column Hersteller contains the correct value. So the server switches over to the table TestBig and searchess for records with the value '178324' in the column PZN (the value is taken from the current record from TestSmall).

TestBig

Again I will only look at the code that handles the WHERE-clause. So here is the first record from TestBig to evaluate:
MariaDB [TestOpt]> select * from TestBig where PZN = 178324 limit 1;
+----------+--------+--------+------+--------------------------------------+----------------------------+------------+
| Id       | PZN    | EVP    | HAP  | ArtikelBez                           | ArtikelText                | Hersteller |
+----------+--------+--------+------+--------------------------------------+----------------------------+------------+
| 01046189 | 178324 |  10.45 | 7.99 | MARLENE MS K1 AD K BLACK M      2 ST | BASILIKUM AETHERISCHES OEL | 20860      |
+----------+--------+--------+------+--------------------------------------+----------------------------+------------+
1 rows in set (0.00 sec)

MariaDB [TestOpt]>

This is the WHERE-statement as printed by the code shown above in the case of this record from the table TestBig:
table_name = <TestBig>  <select SQL_NO_CACHE count(A.PZN) from TestSmall A straight_join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '00020' and B.Hersteller = '36367'>
FUNC-ITEM   [=] args: 2 type=[EQ_FUNC]  
FIELD-ITEM  [TestOpt] [B] [Hersteller]  str_value=<20860>   name=<Hersteller>
STRING-ITEM     str_value=<36367>   name=<36367>
end of COND.

And here it is in a different form:

This hierarchy shows how this WHERE-clause is applied:
TestBig: check record with Id = 01046189 (compare: 20860 and 36367)
evaluate_join_record()
    Item_func_eq::val_int()
        Arg_comparator::compare()
            Arg_comparator::compare_string()
                sortcmp()
                    my_strnncollsp_simple()

The last function returns a non-zero value so this record is thrown away and the next record is read from the table TestBig (the next record with the value '178324' in the column PZN).

COND

On entering the function evaluate_join_record() the (partial) WHERE-clause is extracted, a variable of type COND * points to this object.

COND is nothing more than an alias for the class Item which is defined in the file sql/item.h. Item is the base-class for a lot of classes used for bringing the WHERE-clause into an internal form. You can find a class-hierrachy here: Item Class Reference. Don't let this page confuse you. As you've seen in the hierarchies above the server used some of them to construct our WHERE-clause. And thanks to inheritance and object-orientation it was fairly easy to apply this WHERE-clause to a record.


correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.