Sunday, 9 February 2014

TXT

I would like to present a TXT-storage-engine for MySQL and MariaDB.

why

Think of 2 DB-systems, developed independently of each other (e.g. in 2 different companies). Now you get the job to synchronize the data of these 2 systems, which will continue to exist in the future. This means every night extracting the relevant data (not all, only some tables and only the changed records or new or deleted or ...), write it into a file, move this file to another place and insert htis data into the second DB-system (including error-reporting and -handling). And naturally this has to be done in the opposite direction too.

Usually the file for transferring the data is a text-file, so a structure is developed between the two teams, usually in the form of fixed column-width text. Other possible formats would be DIF (long time ago) or CSV, today this would be XML.

And it requires that at least 2 programs have to be developed, running on 2 different computers (maybe different operating systems, different programming languages), doing 4 jobs: reading and writing to/from system 1 and reading and writing to/from system 2.

I've written multiple programs for such jobs in the past, so it's time to think about a better way to handle this.

concept

The idea is to have such a text-file and to use it directly within MySQL and MariaDB without any special programs, just using plain SQL.

This engine is based on the same idea as the CSV-engine, only the file-format changes. And the CSV-engine is already part of MySQL and MariaDB.

restrictions

This code contains no logic in the handling of the data. The code simply takes the data given to a function and writes it to the datafile. Or it reads the data from the datafile and gives it back to the caller in the server-code.
If you have special relations between the data you have to implement it using a script language or a program in a conventional language. I can't handle this here in a storage engine because I don't know of any restrictions on your DB-system.

credits

The code I like to present here is base on the CSV-engine of Sasha Pachev. You will find his book here or here. Thanks to Sasha for the work.

code

You will find the code of this engine here. Handle it as described in the post about the DUMMY-engine.

tests

After compiling and installing let's start with the first test:
MariaDB [(none)]> create database TestTXT;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> use TestTXT;
Database changed
MariaDB [TestTXT]> create table TestTxtl (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  char(7),
    ->                 EVP  char(8),
    ->                 HAP  char(8),
    ->                 ArtikelBez varchar(40),
    ->                 ArtikelText varchar(26),
    ->                 Hersteller  char(5) )
    ->                 engine=TXT
    -> ;
ERROR 1178 (42000): The storage engine for the table doesn't support nullable columns
MariaDB [TestTXT]/gt;

OK, the CREATE-statement allows NULLs but in text-files NULLs are not possible. So let's fix this in the CREATE-statement and try it again:
MariaDB [TestTXT]> create table TestTxt (                                                                                                                                                                             
    ->                 Id   char(8) not null,
    ->                 PZN  char(7) not null,
    ->                 EVP  char(8) not null,
    ->                 HAP  char(8) not null,
    ->                 ArtikelBez varchar(40) not null,
    ->                 ArtikelText varchar(26) not null,
    ->                 Hersteller  char(5) not null)
    ->                 engine=TXT
    -> ;
ERROR 1178 (42000): The storage engine for the table doesn't support column-type VARCHAR
MariaDB [TestTXT]>

OK, OK, server you're right, we can't handle variable length columns, the columns have to be of fixed width. Let's fix this:
MariaDB [TestTXT]> create table TestTxt (                                                                                                                                                                             
    ->                 Id   char(8) not null,
    ->                 PZN  char(7) not null,
    ->                 EVP  char(8) not null,
    ->                 HAP  char(8) not null,
    ->                 ArtikelBez char(40) not null,
    ->                 ArtikelText char(26) not null,
    ->                 Hersteller  char(5) not null)
    ->                 engine=TXT
    -> ;
Query OK, 0 rows affected (0.05 sec)

MariaDB [TestTXT]>

Uff, this works. So let's drop this table immediately because this example uses only char-types. I want to use a more complex example by using different types for the columns.
MariaDB [(none)]> create database TestTXT;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> use TestTXT;
Database changed
MariaDB [TestTXT]> create table TestTxt (                                                                                                                                                                             
    ->                 Id   numeric(8,0)        not null,
    ->                 PZN  char(7)             not null,
    ->                 EVP  decimal(8,2)        not null,
    ->                 HAP  decimal(9,2)        not null,
    ->                 ArtikelBez char(40)      not null,
    ->                 Datum date               not null,
    ->                 Uhrzeit time             not null,
    ->                 DatumZeit  datetime      not null,
    ->                 Tstamp timestamp         not null,
    ->                 PreciseTime datetime(6)  not null)
    ->                 engine=TXT
    -> ;
Query OK, 0 rows affected (0.04 sec)

MariaDB [TestTXT]> insert into TestTxt values( 12345678, 1234567, 123456.78, 1234567.89, '1234567890123456789012345678901234567890', 
    ->     '2014-02-04', '14:59', '2014-02-04 14:59:00', current_timestamp, current_timestamp)
    -> ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestTXT]> insert into TestTxt values( 12345678, 1234567, 123456.78, 1234567.89, '1234567890123456789012345678901234567890', 
    ->     '2014-02-04', '14:59', '2014-02-04 14:59:00', current_timestamp, '2014-02-05 19:51:17.123456')
    -> ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestTXT]> select * from TestTxt;
+----------+---------+-----------+------------+------------------------------------------+------------+----------+---------------------+---------------------+----------------------------+
| Id       | PZN     | EVP       | HAP        | ArtikelBez                               | Datum      | Uhrzeit  | DatumZeit           | Tstamp              | PreciseTime                |
+----------+---------+-----------+------------+------------------------------------------+------------+----------+---------------------+---------------------+----------------------------+
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 | 14:59:00 | 2014-02-04 14:59:00 | 2014-02-05 19:51:23 | 2014-02-05 19:51:23.000000 |
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 | 14:59:00 | 2014-02-04 14:59:00 | 2014-02-05 19:52:17 | 2014-02-05 19:51:17.123456 |
+----------+---------+-----------+------------+------------------------------------------+------------+----------+---------------------+---------------------+----------------------------+
2 rows in set (0.00 sec)

MariaDB [TestTXT]>

Let's look into the datafile:
MariaDB [(none)]> select @@datadir;
+----------------------------+
| @@datadir                  |
+----------------------------+
| /home/august/MariaDB/data/ |
+----------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> 

In the directory given above I found some directories. Among others there is one with the name TestTXT (= the database we created, see above). In this directory I see some files:
august@AMD4:~/MariaDB/data/TestTXT$ ls -l
insgesamt 12
-rw-rw---- 1 august august  65 Jan 31 15:31 db.opt
-rw-rw---- 1 august august 731 Feb  5 19:51 TestTxt.frm
-rw-rw---- 1 august august 324 Feb  5 19:52 TestTxt.txt
august@AMD4:~/MariaDB/data/TestTXT$ 

The file TestTxt.frm contains the internal information about the table TestTxt. TestTxt.txt contains the data, so let's look into it (I've added column-headers to add some clarity, these headers are not in the data-file):
<- Id  -><-PZN-><- EVP  -><-  HAP  -><-          ArtikelBez                -><- Datum-><-Uhrzeit><-    DatumZeit  -><-     Tstamp    -><-    PreciseTime       ->
 123456781234567 123456.78 1234567.8912345678901234567890123456789012345678902014-02-0414:59:00002014-02-04 14:59:002014-02-05 19:51:232014-02-05 19:51:23.000000
 123456781234567 123456.78 1234567.8912345678901234567890123456789012345678902014-02-0414:59:00002014-02-04 14:59:002014-02-05 19:52:172014-02-05 19:51:17.123456

remark

Don't use this engine for your everyday work, it's not intended for this purpose. The code does not support indexes nor is it designed for speed - and there is no intention to implement this in the future. It' should only be used for transporting data to/from a system.


some details

Some hints about SQL-operations follow.

INSERT

Naturally you can insert data into a table of type TXT. Look at the examples above to see this. And also you can do this with statements like:
 INSERT INTO TableDest SELECT * FROM TableSource;

SELECT

Naturally you can read data from a table of type TXT

UPDATE

By intention updating data in a table of type TXT is not supported.

DELETE

By intention deleting data in a table of type TXT is not supported.


the code

I would like to give some descriptions of some of the functionss of this class.

write_row()

This function is called when an INSERT-statement is executed. It does some housekeeping and calls the function write_line().

write_line()

This functions does the work of writing the data at the end of the datafile. It's always writing at the end of the data-file.
The function is given the data in a buffer. The code iterates on all columns of this table and handles the data according to the type of the column. It converts the data from the buffer into plain text and adjusts the length. It simply returns the data in the text-buffer to the caller.

fetch_line()

This function is called from the function rnd_next(), which is the function for handling a table-scan row by row. And a table-scan is the only way to access the data in a table of type TXT.

update_row()

It simply returns with an error-code.

delete_row()

This function too returns with an error-code.

delete_all_rows()

This function is called when a statement TRUNCATE table is executed. It simply calls the OS to change the file-size to 0.

create()

This function checks all the columns given in the CREATE-statement on validity and returns with an error-code if it finds an unsupported type. It finally calls the create-function of MySQL/MariaDB for wrting the frm-file and for creating the txt-file.