This post describes some of the observations I made in studying the source-code. The earlier post describes the effect of caching for reading from the hard-disk, This article describes this effect for writing onto the disk.
standard-result
Let's start directly with a result. For all tests i've used the tables TestStat and TestDyn as described in the earlier post. Here are the results with a server out of the box:MariaDB [TestCache]> use TestCache;
Database changed
MariaDB [TestCache]> truncate table TestDyn;
Query OK, 0 rows affected (0.19 sec)
MariaDB [TestCache]> insert into TestDyn select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (48.63 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestCache]> truncate table TestDyn;
Query OK, 0 rows affected (0.18 sec)
MariaDB [TestCache]> insert into TestDyn select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (47.76 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestCache]> truncate table TestDyn;
Query OK, 0 rows affected (0.19 sec)
MariaDB [TestCache]> insert into TestDyn select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (49.14 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestCache]> truncate table TestDyn;
Query OK, 0 rows affected (0.18 sec)
MariaDB [TestCache]> insert into TestDyn select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (47.15 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestCache]> truncate table TestStat;
Query OK, 0 rows affected (0.20 sec)
MariaDB [TestCache]> insert into TestStat select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (31.75 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestCache]> truncate table TestStat;
Query OK, 0 rows affected (0.20 sec)
MariaDB [TestCache]> insert into TestStat select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (31.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestCache]> truncate table TestStat;
Query OK, 0 rows affected (0.19 sec)
MariaDB [TestCache]> insert into TestStat select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (31.29 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestCache]> quit
Bye
august@AMD4:~/MariaDB/pgm$
As you can see again writing to a static (=fixed length) table is faster then writing to a dynamic (=varying length) table because of the more complex structure of a dynamic table.Please keep in mind that this INSERT-statement needs a read-operation before the write-operation, this adds up to the time needed for the whole statement so the effect of caching happens twice. And for this reason the numbers for reading and writing can't be compared
This is how the tables look on the OS-side after inserting the records::
august@AMD4:~/MariaDB/data/TestCache$ ls -l
insgesamt 1880292
-rw-rw---- 1 august august 65 Mär 18 15:47 db.opt
-rw-rw---- 1 august august 685 Mär 18 15:50 TestDyn.frm
-rw-rw---- 1 august august 895389376 Apr 1 16:01 TestDyn.MYD
-rw-rw---- 1 august august 1024 Apr 1 16:04 TestDyn.MYI
-rw-rw---- 1 august august 679 Mär 18 15:50 TestStat.frm
-rw-rw---- 1 august august 1030000000 Apr 1 16:03 TestStat.MYD
-rw-rw---- 1 august august 1024 Apr 1 16:04 TestStat.MYI
august@AMD4:~/MariaDB/data/TestCache$
my goal
I want to show you how the storage-engine carries out the write-operation and particularly how the engine uses caches for the write-operation.into the code
Let's step into the code and look how the server does the write-operation:- xxxx::start_bulk_insert() is called at the beginning of the process. it initializes everything needed
- xxxx::write_row() writes one row to the buffer or the disk. this function is called repeatedly until all records are written to disk
- xxxx::end_bulk_insert() is called at the end of the of the write-operation. here you can clean things up
The description above is valid in the case of writing multiple records to disk. When you insert a single record things look easier as you can see here.
This is the function hierarchy using MyISAM in the case of fixed-length records.
init:
ha_myisam.cc : ha_myisam::start_bulk_insert()
mi_extra.c : mi_extra()
mf_iocache.c : int init_io_cache() inits a buffer (=cache) of size 128K
writing rows:
ha_myisam.cc : ha_myisam::write_row()
mi_write.c : mi_write()
mi_statrec.c : _mi_write_static_record()
my_sy.h : my_b_write()
mf_iocache.c : my_b_flush_io_cache()
mysql_file.h : mysql_file_write()
mysql_file.h : inline_mysql_file_write()
my_write.c : my_write() writes buffer (approx. 128K) to disk
the final step:
ha_myisam.cc : ha_myisam::end_bulk_insert()
mi_extra.c : mi_extra()
mf_iocache.c : end_io_cache()
mf_iocache.c : my_b_flush_io_cache()
mysql_file.h : mysql_file_write()
mysql_file.h : inline_mysql_file_write()
my_write.c : my_write() writes buffer (max. 128K) to disk
This is the function hierarchy using MyISAM in the case of variable-length records:
init:
ha_myiam.cc : ha_myisam::start_bulk_insert()
mi_extra.c : mi_extra()
mf_iocache.c : int init_io_cache() inits a buffer (=cache) of size 128K
writing rows:
ha_myisam.cc : ha_myisam::write_row()
mi_write.c : mi_write()
int _mi_write_dynamic_record(MI_INFO *info, const uchar *record)
mi_dynrec.c : write_dynamic_record()
mi_dynrec.c : _mi_write_part_record()
my_sy.h : my_b_write()
mf_iocache.c : my_b_flush_io_cache()
mysql_file.h : mysql_file_write()
mysql_file.h : inline_mysql_file_write()
my_write.c : my_write() writes buffer (approx. 128K) to disk
the clean-up:
ha_myisam.cc : ha_myisam::end_bulk_insert()
mi_extra.c : mi_extra()
mf_iocache.c : end_io_cache()
mf_iocache.c : my_b_flush_io_cache()
myql_file.h : mysql_file_write()
mysql_file.h : inline_mysql_file_write()
my_write.c : my_write() calls write() of C std-lib which writes the current contents of the buffer to disk
modifications
As in the case of read-caches I want to know how this buffer is used and especially what happens when I change the size of the buffer. So let's again modify the code a bit. I will present the places where I added or changed code and mark added/changed lines in bold, but only those modifications that are needed for our monitoring of write-caching.I've already described the reason and some aspects of these modifications here so please look in my former post.
At first: when the cache is unable to satisfy the request the data must be written to the datafile on disk, so let's count how many times the function write() of the C-library is called:
modify the file storage/myisam/ha_myssiam.cc near the top of this file:
extern int counterReadStaticRecord; // from the former test
extern int counterReadDynamicReacord;
extern int counterWriteStaticRecord;
extern int counterWriteDynamicRecord;
In the same file please look for the function start_bulk_insert() and modify it like this:void ha_myisam::start_bulk_insert(ha_rows rows, uint flags)
{
DBUG_ENTER("ha_myisam::start_bulk_insert");
counterWriteStaticRecord = counterWriteDynamicRecord = 0;
THD *thd= current_thd;
and next look for end_bulk_insert() in the same file:
int ha_myisam::end_bulk_insert()
{
DBUG_ENTER("ha_myisam::end_bulk_insert");
mi_end_bulk_insert(file);
int err=mi_extra(file, HA_EXTRA_NO_CACHE, 0);
fprintf(stderr, "no. of writes:\tstatic=%d\tdynamic=%d\n", counterWriteStaticRecord, counterWriteDynamicRecord);
if (!err && !file->s->deleting)
modify the file storage/myisam/mi_statrec.c near the function _mi_write_static_record():
//##########################################################
// the original code:
//#define my_b_write(info,Buffer,Count)
// ((info)->write_pos + (Count) <=(info)->write_end ?
// (memcpy((info)->write_pos, (Buffer), (size_t)(Count)),
// ((info)->write_pos+=(Count)),0) :
// (*(info)->write_function)((info),(uchar *)(Buffer),(Count)))
int counterWriteStaticRecord;
static inline int AQ_my_b_write(IO_CACHE *info, uchar *Buffer, ulong Count)
{
if ( (info)->write_pos + (Count) <=(info)->write_end )
{
memcpy((info)->write_pos, (Buffer), (size_t)(Count));
((info)->write_pos+=(Count));
return 0;
}
counterWriteStaticRecord++;
return (*(info)->write_function)((info),(uchar *)(Buffer),(Count));
}
int the function _mi_write_static_record() I changed some lines:
if (info->opt_flag & WRITE_CACHE_USED)
{ /* Cash in use */
// if (my_b_write(&info->rec_cache, record,
// info->s->base.reclength))
//
if (AQ_my_b_write(&info->rec_cache, record, info->s->base.reclength))
goto err;
OK, for static tables we are done, so let's go to dynamic tables: in the file storage/myisam/mi_dynrec.c near the function _mi_write_part_record():
/* Write a block to datafile */
/*
#define my_b_write(info,Buffer,Count) \
((info)->write_pos + (Count) <=(info)->write_end ?\
(memcpy((info)->write_pos, (Buffer), (size_t)(Count)),\
((info)->write_pos+=(Count)),0) : \
(*(info)->write_function)((info),(uchar *)(Buffer),(Count)))
*/
int counterWriteDynamicRecord;
static inline int AQ_my_b_write(IO_CACHE *info, uchar *Buffer, ulong Count)
{
if ( (info)->write_pos + (Count) <=(info)->write_end )
{
memcpy((info)->write_pos, (Buffer), (size_t)(Count));
((info)->write_pos+=(Count));
return 0;
}
counterWriteDynamicRecord++;
return (*(info)->write_function)((info),(uchar *)(Buffer),(Count));
}
int _mi_write_part_record(MI_INFO *info,
and in the function _mi_write_part_record():
// else if (my_b_write(&info->rec_cache,(uchar*) *record-head_length,
// length+extra_length+del_length))
else if (AQ_my_b_write(&info->rec_cache,(uchar*) *record-head_length, length+extra_length+del_length))
goto err;
You will find the reason for the usage of AQ_my_b_write() in my former post.more modifications
Now we are able to count the number of writes, but we have to add some more modifications to change the size of the buffer.in the file sql/sysvars.cc:
static Sys_var_ulong Sys_read_buff_size(
"read_buffer_size",
"Each thread that does a sequential scan allocates a buffer of "
"this size for each table it scans. If you do many sequential scans, "
"you may want to increase this value",
SESSION_VAR(read_buff_size), CMD_LINE(REQUIRED_ARG),
VALID_RANGE(IO_SIZE*2, INT_MAX32), DEFAULT(128*1024),
BLOCK_SIZE(IO_SIZE));
as you can see the default value is 128*1024 = 128K (I marked the entry in bold). For the tests I will change this value, recompile and do my tests again.And finally in include/my_global.h:
// #define IO_SIZE 4096
#define IO_SIZE 64
now recompile everything (the last modification leads to a recompilation of almost everything) and do the test.
results
Each test is done 3times, it starts with truncating the target-table so we start with a table of size 0. The values presented here are the average values of the 3 tests. The tests started with a cache of size 128 bytes and ended with a size of 1GB. The SQL-statements used are the same statements as those used for the standard-result above.conclusion
The default value of these buffers is 128K bytes. If you look at the picture above you will see again that this is a good compromise. And keep in mind that RAM is a finite resource, and you're not alone on this server. Each INSERT-statement of the test-examples uses 2 buffers, one for reading the data from the table and the other for writing the data in the target table so you have double the amount of RAM needed for such a simple statement. And think of situations when 1000 users are active in your server having multiple and more complex statements, then you will need a lot of RAM and the graph shows that you will not gain much performance.In these day of modern economies and especially accounting practices in bigger countries you have to pay for the RAM you assign to your server, even if this billing is only done internally. This can add up to significant numbers.
I think the value of 128K for the size of the cache is a good compromise.
If you are interested in more details about my tests you may look into this text-file.
some numbers
Let's first check the number of writes as reported by our code added to the original sourceThe values are:
number of writes with different sizes of buffer:
size of number of writes
buffer static dynamic
128b 3906250 3293831
256b 3906250 3293831
512b 2000000 1679340
1K 1000000 855184
2K 500000 432774
4K 250000 217485
8K 125000 109022
16K 62822 54580
32K 31411 27307
64K 15705 13658
128K 7856 6830
256K 3928 3415
512K 1964 1707
1M 982 853
2M 491 426
4M 245 213
8M 122 106
16M 61 53
32M 30 26
64M 15 13
128M 7 6
256M 3 3
512M 1 1
1G 0 0
In the last line you see why my testing stopped here without further increasing the size of the buffer: with a buffer-size of 1GB the whole file fits into the buffer.
The writes are counted when the buffer is full and hast to be written to disk. These numbers do not include the final write()-call when the contents of the buffer is written to disk in a call to end_bulk_insert().This explains the last values of 0 writes.
In the first lines you will also see identical numbers for the cache-sizes of 128 and 256 bytes. The reason for this is that with these values a minimal buffer of 256 is used even when it is set to a smaller value.
If you multiply the number of writes with the size of the buffer you will always get a value that is smaller then the size of the corresponding file. The difference is less then the size of the buffer used in the test and this difference fits in the buffer and is written to the disk in the call of end_bulk_insert().
And you will also see that doubling the size of the cache halves the number of writes.
a hint
when the write-buffer is accessed for the first time it will be created with a size of 128K by default. But this size is not fully used, it uses a little bit less (and uses the buffer for full in all following rounds). I will try to explain this behaviour in an example.In one test my file (=table) had initially a size of 133900 bytes. When I insert data into this table it is appended at the end of this file (no free space in this table). Now a buffer of 128K is allocated and the numbers are as follows:
size of file 133900 = 0x20b0c buffer 131072 = 0x2000 of this buffer the server uses only 128244 bytes (=0x1f4f4).
end of file 0x20b0c size of buffer + 0x1f4f4 total = 0x40000
As you can see we are now on a round number (at least in hex). But why did the erver waste these 2828 bytes?
data is read from the harddisk in blocks and it is written on harddisk in blocks too. Positioning somewhere in such a block and writing some bytes on this place means that this block must be read, the data written on the position in the block and then the block is written back to its original place. When we follow the strategy of the server then with the first usage of the block we have to do the same as described, butt all following cases will begin writing on block boundaries, so the first read-operations is not needed. This trick speeds up the operation a bit (not tested).
The block-sizes of harddisks are 512 bytes, 1K, 2K or (maybe) 4K currently. So the server subtracts a little bit from the end so that the addition will result in multiple of the block-size (and it assumes a 4K block). Here is the code taken from init_io_cache() (in mf_iocache.c):
if (type == WRITE_CACHE)
info->write_end=
info->buffer+info->buffer_length- (seek_offset & (IO_SIZE-1));
functions
Above I've written about functions of the storage-engine that are called in the execution of the INSERT-stmt. Here is a listing of these functions as they are called by the server:<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::column_bitmaps_signal
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::store_lock
<TestMaxVar> ha_maxtempl::external_lock
<TestMaxVar> ha_maxtempl::start_bulk_insert
<TestMaxVar> ha_maxtempl::write_row this function is called mulitple times, once for each rowy
<TestMaxVar> ha_maxtempl::end_bulk_insert
<TestMaxVar> ha_maxtempl::enable_indexes
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::reset
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::external_lock
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::reset
For this test I created a table TestMaxVar in the database TestCache. Then I inserted 10 records into this table using the INSERT-statement used above, but restricted to 10 records. You can see which functions are called, how often they are called and in which order they are called. The table TestMaxVar uses the storage engine MAXTEMPL.
In the case of inserting only a single row this looks like that:
insert into TestMaxVar values(
'01000000', '999999', '132.36', '0.00', 'UROSAFE BEINBEUTEL 7732C 1 L', 'SPRING DE LU AM K1 NO TOP5', '25677' );
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::column_bitmaps_signal
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::store_lock
<TestMaxVar> ha_maxtempl::external_lock
<TestMaxVar> ha_maxtempl::write_row
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::external_lock
<TestMaxVar> ha_maxtempl::extra
<TestMaxVar> ha_maxtempl::reset
Ass you can see it's a bit simpler. And the functions start_bulk_insert() and end_bulk_insert() are missing in this case so no caching is done here.