Thursday, 11 June 2015

0xA8

In my last post I showed how the optimizer handles the case that multiple indexes on one table exist. In the text I described how accessing one table via an index-scan is inspected. Here is the code in the function SQL_SELECT::test_quick_select() that decides to inspect or to ignore an index:
int SQL_SELECT::test_quick_select(
                  ....
                  key_map keys_to_use,
                  ....)
{
    ....
    for (idx=0 ; idx < head->s->keys ; idx++, key_info++)
    {
      ....
      if (!keys_to_use.is_set(idx))
         continue;
      ....
    } 
    ....
}

and here is the function-hierarchy:
make_join_statistics()
   get_quick_record_count()
      SQL_SELECT::test_quick_select()

The var. keys_to_use is a bitmap (type key_map) and 64 bits long. Every bit represents an index which can or cannot be used. As the value of this variable was 0xA8 (= 010101000b) during my test only the indexes 3, 5 and 7 are inspected, the others are omitted.

In this post I want to show where this number 0xA8 came from. Here I take only a small piece from the much bigger mosaic but I want to inspects this tiny piece in more detail. And the var. keys_to_use from the code-excerpt above is a parameter in the function-call of SQL_SELECT::test_quick_select(), this function is called from get_quick_record_count() where this parameter is named keys, and this function is called from make_join_statistics() where this parameter is &s->const_keys where s is a pointer to a JOIN_TAB. So we will have to find out where JOIN_TAB.const_keys is set.

some hints

As in the last post I did all my tests with MariaDB, the code-examples presented here are taken from the source-code of MariaDB 10.0.10, but the explanations should also be valid für MySQL.

Please do not generalize the results presented here. A lot of the data presented here depends on the data in the tables, and this is my data. If you do your own tests don't expect to see the same numbers (like the values 0xA8 in JOIN_TAB->const_keys).

the statement

The whole statement is internally represented by an object of type JOIN, which among other variables contains a pointer to 2 objects 1) of type JOIN_TAB (in sql_select.h):
class JOIN :public Sql_alloc
{
   .... 
public:
  JOIN_TAB *join_tab, **best_ref;
   .... 
}

The struct JOIN_TAB has a member-variable const_keys, defined (in sql_select.h):
typedef struct st_join_table {
  ....
  key_map const_keys;   /**< Keys with constant part */
  ....
} JOIN_TAB;

The JOIN_TAB representing the table TestSmall is not of interest because on this table no indexes were defined, but for the JOIN_TAB representing the table TestBig this member-variable contains the value 0xA8. So here is the way how this variable is filled:
make_join_statistics()
    update_ref_and_keys()
        add_key_fields()            // calls itself recursively
            add_key_field()
                Field::get_possible_keys()

Each column of a table is represented internally by an object of type Field2) (in table.h):
/**
  This structure is shared between different table objects. There is one
  instance of table share per one table in the database.
*/

struct TABLE_SHARE
{
  ....
  /* The following is copied to each TABLE on OPEN */
  Field **field;
  ....
};

And as the value is returned from the function Field::get_possible_keys(), here is an excerpt of the comment and the function (in field.cc):
/*
  @brief
  Return possible keys for a field

  @details
  Return bit map of keys over this field which can be used by the range
  optimizer. For a field of a generic table such keys are all keys that starts
  from this field.
  ....
  @return map of possible keys
*/

key_map Field::get_possible_keys()
{
  DBUG_ASSERT(table->pos_in_table_list);
  return (table->pos_in_table_list->is_materialized_derived() ?
          part_of_key : key_start);
}

For the object representing the column Hersteller the function Field::get_possible_keys() returns the value 0xA8 (from the member-var. key_start3)). Here is an excerpt from add_key_field() to show how this variable is set:
      ....
      JOIN_TAB *stat=field->table->reginfo.join_tab;
      key_map possible_keys=field->get_possible_keys();
      ....
      if (is_const)
      {
        stat[0].const_keys.merge(possible_keys);
         ....
      }

In words: the function Field::get_possible_keys() returns the value 0xA8 for the object representing the column Hersteller, this value is stored in the var. possible_keys and some lines later stored in JOIN_TAB.const_keys.

So we can narrow down the question to: where does the Field-object gets this value?

Field

And the answer is: in handling of USE Database these values are set. Let me describe how this is done.

Here is the function-hierarchy:
do_handle_one_connection()
    do_command()
        dispatch_command()
            mysqld_list_fields()
                open_normal_and_derived_tables()
                    open_tables()
                        open_and_process_table()
                            open_table()
                                tdc_acquire_share()
                                    open_table_def()
                                        TABLE_SHARE::init_from_binary_frm_image()
                                            create_key_infos()
                                            make_field()

So here is the way to the value 0xA8:
in open_table_def() the file "./TestOpt/TestBig.frm" is opened and the first 64 bytes are read from this file. The len of this file is taken from the bytes 10 to 13 of this block. The 64 bytes just read are copied to the var. buf and the rest of the frm-file is appended to the var. buf by reading the rest of the file. The file is closed and the function init_from_binary_frm_image() is called.

In TABLE_SHARE::init_from_binary_frm_image() the internal copy of the frm-file is interpreted and the needed informations extracted. You will find a description of the structure of the frm-file here, so let's look into the file TestBig.frm; all comments are made by me and they are not complete4):
august@AMD4:~/MariaDB/data/TestOpt$ od -t x1 -Ax --read-bytes=1024 TestBig.frm
000000 fe 01 
       0a                             version of frm
       09 
       12 00                          len of extra2-segment (MariaDB)
       56 00                          offset -> disk_buff
       01 00 
       e9 11 00 00                    len of frm-file
       58 0f
000010 61 00                          len of record
       00 00 00 00                    max. rows
       00 00 00 00                    min. rows
       00                             flag: one record db
       02 f9 00 
       09 00                          DB create options
000020 00                             flag: new frm-file
       05 
       00 00 00 00                    avg. row len
       08
       00                             transaction and page checksum: HA_CHOICE_UNDEF
       00                             
       00                             row type: ROW_TYPE_DEFAULT
       00
       00 00                          stats_sample_pages
       00                             HA_STATS_AUTO_RECALC_DEFAULT
       00 00 58
000030 0f 00 00 
       aa 86 01 00                    MySQL version
       10 00 00 00                    len of extra-segment (starts in 0x100F, contains name of engine + more)
       00 00                          
       00 
       00 00                          key block size
# extra2-segment:
000040 00                             type
       10                             len of segment
       c1 a5 3b b2 da e1 11 e4 
       82 e3 bc 5f f4 85
000050 7f 33 
       1f 10 00 00                    var. pos = offset into frm-file  = 0x101F
                                      position of pos = 0x12 (value taken from 4-5) + constant FRM_HEADER_SIZE (=64d)
# disk_buff:
000056 08                             no. of indexes defined
       0e                             no. of columns used in indexes 5)
       00 00 
       35 00                          len of something
# first index: EVP
00005C 41 00                          flags   -> 0x1000 index has comments ???
       04 00                          len of key
       01                             no. of parts of this index
       00                             algorithm: HA_KEY_ALG_UNDEF
       00 00                          size of block
# first part of this index: column EVP
       03 80                          no. of column, ANDed with 0x3ff
       11 00                          offset in record (-1 -> 16d) ???
       00                             flag for key-part
       03 82                          key-type
       04 00                          len
# second index: Ind1
       69 00 
       26 00 
       03 
       00 
       00 00 
# first part of this index: column PZN
       02 80 
       0a 00 
       00 
       00 80 
       07 00 
# second part of this index: column ArtikelText
       06 80 42 00 00 00 80 1a 00 
# third part of this index: column Hersteller
       07 80 5d 00 00 00 80 05 00 
# third index (including description of parts):
       41 00 04 00 01 00 00 00        index HAP
       04 80 15 00 00 03 82 04 00     column HAP
# fourth index:
       69 00 26 00 03 00 00 00        index Ind2
       07 80 5d 00 00 00 80 05 00     column Hersteller
       06 80 42 00 00 00 80 1a 00     column ArtikelText
       02 80 0a 00 00 00 80 07 00     column PZN
# fifth index:
       41 00 07 00 01 00 00 00        index PZN
       02 80 0a 00 00 00 80 07 00     column PZN
# sixth index:
       41 00 05 00 01 00 00 00        index Hersteller
       07 80 5d 00 00 00 80 05 00     column Hersteller
# seventh index:
       41 00 0c 00 02 00 00 00        index PznHerst
       02 80 0a 00 00 00 80 07 00     column PZN
       07 80 5d 00 00 00 80 05 00     column Hersteller
# eigth index:
000100 41 00 0c 00 02 00 00 00        index HerstPzn
       07 80 5d 00 00 00 80 05 00     column Hersteller
       02 80 0a 00 00 00 80 07 00     column PZN

# the name of the indexes defined:
       ff 45 56 50                                EVP
       ff 49 6e 64 31                             Ind1
       ff 48 41 50                                HAP
       ff 49 6e 64 32                             Ind2
       ff 50 5a 4e                                PZN
000130 ff 48 65 72 73 74 65 6c 6c 65 72           Hersteller
       ff 50 7a 6e 48 65 72 73 74                 PznHerst
       ff 48 65 72 73 74 50 7a 6e                 HerstPzn
       ff 00 00
000150 00

Looks confusing. It's not, let me try to explain some parts of this bunch of bytes. On Position 6 of the frm-file we see a value of 0x0056, this is an offset into the frm-file and in the code this area is called disk_buff. The array disk_buff starts with the number of keys (=indexes) defined for our table: 8, followed by the number of columns used in these indexes: 14. A record_offset is computed and the result is 4014d (= 0xfae) which points into a part of the frm-file not shown here. The type of the engine and some details about this engine are stored there.

Let's look into the array disk_buff (starting at file-offset 0x0056). The first bytes of disk_buff are omitted and the function create_key_infos()6) is called with the parameter strpos starting with disk_buff[6]. In this function it iterates over the indexes and in pseudocode it looks like this:
  # iterate over the 8 indexes defined
  for (i=0 ; i < keys ; i++, keyinfo++)        
  {
     interpret the next 8 bytes 
     store the values in keyinfo
     # iterate over all columns this index is defined of
     for (j=keyinfo->user_defined_key_parts ; j-- ; key_part++)
     {
         the next 9 bytes contain information about the column this index is build of
         will be stored in keypart
     }
     keyinfo->key_part points to keypart
   }
The last step is to copy the names of the indexes.

Here is the data it operates on:
august@AMD4:~/MariaDB/data/TestOpt$  od -t x1 -Ax --skip-bytes=0x113f  TestBig.frm
column #1: Id
00113f 00 00 00                               
       08 00                                  len of field
       02 00 00                               position of record within buffer
       00 80                                  flag
       00                                     unireg_type 
       00
       00                                     interval-no.
       fe                                     MYSQL_TYPE_STRING
       08 00
00114f 00 
column #2: PZN
       00 00 00 07 00 0a 00 00 00 80 00 00 00 fe 08 00 00 
column #3: EVP
       00 00 00 09 00 11 00 00 03 82 00 00 00 f6 08 00 00
column #4: HAP
       00 00 00 09 00 15 00 00 03 82 00 00 00 f6 08 00 00 
column #5: ArtikelBez
       00 00 00 28 00 19 00 00 00 80 00 00 00 0f 08 00 00 
column #6: ArtikelText
       00 00 00 1a 00 42 00 00 00 80 00 00 00 0f 08 00 00
column #7: Hersteller
       00 00 00 05 00 5d 00 00 00 80 00 00 00 fe 08 00 00
names of the columns:
       ff                                       delimiter
       49 64 ff                                 Id
       50 5a 4e ff                              PZN
       45 56 50 ff                              EVP
       48 41 50 ff                              HAP
       41 72 74 69 6b 65 6c 42 65 7a ff         ArtikelBez
       41 72 74 69 6b 65 6c 54 65 78 74 ff      ArtikelText
       48 65 72 73 74 65 6c 6c 65 72 ff         Hersteller
       00
0011e9

For this function the work is done and its returning.

Back in TABLE_SHARE::init_from_binary_frm_image() some work is done like copying the names of the columns defined for this table, but this is not my topic here so I omit these details. Later again it iterates over all columns defined and calls the function make_field() to create a new object of type Field (or derived) for each column.

With the Filed-objects constructed the code again iterates over all indexes defined and it sets the bits in the var. key_start:
    for (uint key=0 ; key < keys ; key++,keyinfo++)                     // iterate over the indexes defined
    {
      .....
      for (i=0; i < key_parts; key_part++, i++)                         // iterate over the parts of this index
      {
        Field *field;
        ....
        field= key_part->field= share->field[key_part->fieldnr-1];      // get the Field-object for this part of the index
        ....
        if (i == 0)                                                     // if this is the first part of the index
          field->key_start.set_bit(key);                                // set the corresponding bit in key_start
        ....
      }
      ....
    }

By inspecting the indexes 3 (=Ind2), 5 (=Hersteller) and 7 (=HerstPzn) it sets the corresponding bit in the var. key_start of the Field-object representing the column Hersteller as these indexes start with the column Hersteller. So the result will be the value 010101000b) = 0xA8.

And this finishes my description.

correctness

This text is still a simplified presentation of what's going on in the software. In case something is wrong with my description please use the mail-function of this site and send me a mail describing my error. I will look at it. Thanks.

some notes:
1) the pointer join_tab points to the first of (in our case) 2 JOIN_TABs. You can look at JOIN->table_count for the correct number of tables.
2) these objects are created when you USE a database.
3) here it the definition taken from field.h:
 class Field
{
  ......
  /* Field is part of the following keys */
  key_map key_start, part_of_key, part_of_key_not_clustered;
  ....
};
4) I didn't want to document the structure of the frm-file, I only wanted to know where the value 0xA8 came from
5) up to 512 columns can be used because one bit in the byte before can be used.
6) it's in table.cc