an unsystematic journey through the source-code of MySQL and MariaDB: June 2014

this is my fourth post about this topic: storing data in a column-oriented form. You can find the other posts here: columns, columns (2) and columns (3).

Everything presented here is based on MyISAM. So in the first post the data was stored exactly as a MyISAM-file, only accessing the data was a little bit modified. In the second post an algorithm was introduced in which an additional file was created and in this file the values of one column were stored additionally to storing the record in the MYD-file. In the third post the algorithm was modified to speed things up again.

In this post I like to improve the algorithm again and want to use it in the form of an index, something that I avoided until now.

warning

The goal of this post is to show how to speed-up a SELECT-statements (and if this is possible). The code presented here is not production-ready, it is (as I've written many times before) a proof of concept. It is still not generally usable, it cannot handle UPDATEs or DELETEs, lacks error-checking, handling of all possible cases, handling of concurrent access (no locking) and so on. And it is not thoroughly tested.

In short: don't use this code (except for experimenting). Nevertheless I hope to show what we can expect of the algorithm I will present.

algorithm

The algorithm used here is nearly identical to the algorithm presented in the post column 3. What I introduce here as a new feature is sorting the entries in the key-file. The sorting-algorithm is silly, I know, but it works. Writing this sorting-code was faster than writing the full featured B-tree-code that is used normally in such a place. And as sorting is not the focus of interest in the current implementation I have no intention to change the algorithm.

In MySQL/MariaDB the data is organized as a database (= a folder) and in this database as tables, each table is represented by 3 files (MYD contains the data, MYI contains the index(es), frm contains internal data) . This is valid for the MyISAM-engine, other engines can handle this differently. We will base our test-engine on MyISAM so we will follow this route.

code

For this post I created a storage-engine MYCOLS3. For some internal reasons it was created as a static engine, not as a loadable plugin. You will find the code here.

Please remove the older engines MYCOLS1 and MYCOLS2 from your project: there are classes in MYCOLS2 and MYCOLS3 with identical names, so the linker will complain about this if you do not remove them.

As this storage-engine is derived from the MyISAM-engine we inherit a lot of code from this engine but some things must be implemented in an extended way. As with MYCOLS2 the functions create(), write_row(), rnd_init(), rnd_next(), rnd_end() and cond_push() (plus some more) are implemented in an identical or similar way. New with this algo are the implementation of the functions index_read_map(), index_init(), index_next(), index_next_same() and index_end(). These function are called when the database-server accesses the data via an index (instead of a full table-scan) and they return the key of an index (=part of the record) or the full record.

It's easy to add a storage engine to MariaDB/MySQL but it's not so easy to add a new type of index to an existing storage engine. Following the description of the statement to create an index you can add USING index_type but you cannot add a type of your own. In include/my_base.h you will find the definitions of possible index types, in sql/lex.h you will find the posible keywords and the checking and usage of an index is coded deeply in the code of the MyIsam-engine. So in the algo I decided to create tables with an index in duplicated form: when an index on the column Hersteller is created (our usual example, again) then the standard-index of MyIsam is created plus additionally the data of this column is stored in the form of this algo. When the index is accessed our code takes over and handles this using the key- and data-files created by our algo.

To simplify coding some parts of the code in MYCOLSS3 are hardcoded so they fit nicely in my environment. If you want to play with the code or do your own tests you have to check the code and adapt it for your environment. An example: in the tests following soon I will create tables with 2 simple indexes: on the column PZN (this one will receive the number 0 by the database-server) and on the column Hersteller (this will get the number 1 internally). In my coding I assume that using the index with the number 1 is the one to use with my 'index' and therefore my code takes over.
Please excuse for beeing so lazy.

testing

For a test I created 2 tables in the database Testcol using my own storage-engine:

MariaDB [(none)]> use TestCol;
Database changed
MariaDB [TestCol]> create table TestMyCols3Small (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  char(7),
    ->                 EVP  char(8),
    ->                 HAP  char(8),
    ->                 ArtikelBez char(40),
    ->                 ArtikelText char(26),
    ->                 Hersteller  char(5)  comment 'columnar_storage',
    ->                 index PZN (pzn),
    ->                 index Hersteller (Hersteller) )
    ->                 engine=MYCOLS3
    -> ;
Query OK, 0 rows affected (0.04 sec)

MariaDB [TestCol]> create table TestMyCols3Big (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  char(7),
    ->                 EVP  char(8),
    ->                 HAP  char(8),
    ->                 ArtikelBez char(40),
    ->                 ArtikelText char(26),
    ->                 Hersteller  char(5)  comment 'columnar_storage',
    ->                 index PZN (pzn),
    ->                 index Hersteller (Hersteller) )
    ->                 engine=MYCOLS3
    -> ;
Query OK, 0 rows affected (0.07 sec)

MariaDB [TestCol]>

In these CREATE-statements I added 2 indexes to the definition of each table. I will use them soon with the test-statements.

Let's put the usual data into our tables:

MariaDB [TestCol]> insert into TestMyCols3Small  select * from TestOK.ABDAPZN;
Query OK, 301036 rows affected (32.04 sec)
Records: 301036  Duplicates: 0  Warnings: 0

MariaDB [TestCol]> insert into TestMyCols3Big  select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (1 hour 35 min 42.11 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [TestCol]>

Again these operations are slow with the current implementation of this algo, but we are looking for results elsewhere.

So let's check how fast this algorithm is:

MariaDB [TestCol]> select SQL_NO_CACHE count(*) from TestSmall where Hersteller = '08650';     
+----------+
| count(*) |
+----------+
|     1334 |
+----------+
1 row in set (0.03 sec)

MariaDB [TestCol]>  select SQL_NO_CACHE count(*) from TestBig where Hersteller = '36440';
+----------+
| count(*) |
+----------+
|   315857 |
+----------+
1 row in set (0.55 sec)

MariaDB [TestCol]> set optimizer_switch='engine_condition_pushdown=on';
Query OK, 0 rows affected (0.00 sec)

MariaDB [TestCol]> select SQL_NO_CACHE count(*)  from TestMyCols3Small where Hersteller = '08650';
+----------+
| count(*) |
+----------+
|     1334 |
+----------+
1 row in set (0.01 sec)

MariaDB [TestCol]> select SQL_NO_CACHE count(*) from TestMyCols3Big where Hersteller = '36440';
+----------+
| count(*) |
+----------+
|   315857 |
+----------+
1 row in set (0.32 sec)

MariaDB [TestCol]>

Please notice the keyword SQL_NO_CACHE in the SELECT-list. This allows us to repeat a statement and the MariaDB/MySQL-server will not use the result from its internal cache but read the data from the table or index again (caching by the underlying OS still happens).

If you compare the values presented here with the values of the tests in the older posts you see that the statements are executed faster because of the use of the index Hersteller instead of a full table-scan. And after engine-condition-pushdown is enabled in the database-server the statement is analyzed by our algo, internal classes are created and in the index-functionss our code handles the fetching of the data by using the 'index' of this algo. And this is faster then the conventional index.

Looks good, but what is it in the case of a more complex statement:

MariaDB [TestCol]> select SQL_NO_CACHE count(*) from TestSmall A  join TestBig B on (A.PZN = B.PZN) where A.Hersteller = '08650' and B.Hersteller = '36440';
+----------+
| count(*) |
+----------+
|      112 |
+----------+
1 row in set (0.07 sec)

MariaDB [TestCol]> select SQL_NO_CACHE count(*) from TestMyCols3Small A  join TestMyCols3Big B on (A.PZN = B.PZN) where A.Hersteller = '08650' and B.Hersteller = '36440';
+----------+
| count(*) |
+----------+
|      112 |
+----------+
1 row in set (0.05 sec)

MariaDB [TestCol]>

Looks good, it looks like it is fast and our 'index' is faster then the original one (in this case).

And finally a statement that should be well suited for the layout of the data with our algorithm:

MariaDB [TestCol]> set optimizer_switch='engine_condition_pushdown=off';
Query OK, 0 rows affected (0.00 sec)

MariaDB [TestCol]> select SQL_NO_CACHE Hersteller, count(*) from TestMyCols3Big where Hersteller<> 'ABCDE' group by Hersteller into outfile 'x.x';
Query OK, 2584 rows affected (19.86 sec)

MariaDB [TestCol]> set optimizer_switch='engine_condition_pushdown=on';
Query OK, 0 rows affected (0.00 sec)

MariaDB [TestCol]> select SQL_NO_CACHE Hersteller, count(*) from TestMyCols3Big where Hersteller<> 'ABCDE' group by Hersteller into outfile 'y.y';
Query OK, 2584 rows affected (10.04 sec)

MariaDB [TestCol]>

Twice as fast. Good.

With engine-condition-pushdown off the standard index will be taken, with engine-condition-pushdown on our 'index' will be taken. The output is written into a file to remove most of the output from the screen. And it is easier to compare the contents of the files (=results of the queries) later. Using a condition is a restriction of the (current) implementation of the algo: without it our index-handling would not be executed.

Tuesday, 24 June 2014

columns (4)

warning

algorithm

code

testing