Sunday 23 February 2014

dBASE

I would like to present a tiny storage-engine for MySQL and MariaDB that stores the data in the dBASE-format. This format and the software around it were quite popular a long time ago.

You can find a detailed description of dBASE in the Wikipedia.

why

A lot of software can read and write dBASE-files directly. So you handle your data in MySQL or MariaDB and (as an example) want to write some standard-letters using MS Word or LibreOffice. One solution could be to extract the data from the tables in MariaDB or MySQL and put them into a DBF-file. This file can then be accessed from your favorite word-processing-software.
Other applications are possible. In a former project we had to give some data to a call-center-company for telephone-calls for reminding people of some action. This was done in a similar way.

concept

The idea for this engine is the same as for the TXT-engine I presented earlier.
You can simply extract the data from an existing application, which writes the data in the DBF-format. Transfer this file to MySQL/MariaDB and read the data from the DBF-file into your preferred format for further processing.
OR:
Handle your data in MariaDB/MySQL. Extract the data from these tables and write it into one or more tables of type DBF. Transfer these files to the other application that can read and handle DBF-files.

hint

You do have to create your DBF-tables in MariaDB/MySQL before you can work with them, this step tells MySQL/MariaDB about these tables and also creates a DBF-file with a structure dBASE can understand. Later you can replace the DBF-file in the data-directory of MariaDB/MySQL with another DBF-file of identical structure but different content. If you replace it with a DBF-file of a different structure MySQL/MariaDB cannot handle this.

compatibility

I created some tables within MariaDB and tested the DBF-files with an old copy of dBASE IV. I will show you the results. No problems occured.

restrictions

The purpose of this engine is not to use it as a production engine. The intention is for making the transfer of data between two systems easier.
So this engine does not support indexes; it does not read or write any NDX, MNDX, NTX and so on files. And I do not intent to add this feature in the future.

code

You will find the code of this engine here. Handle it as described in the post about the DUMMY-engine.

tests

I did some tests by reading from and writing to such a table in the DBF-format. Then I transferred the tables to Windows XP. For this OS I have an old copy of dBASE IV (running in a DOS-box on XP) and with this software I tested my DBF-files and modified some data. After transferring the DBF-files back to MariaDB I looked at these table again.

The tests step by step: in MariaDB I created a database and in this I created a table, inserted some records into this table, checked the result via SELECT, did an UPDATE on one record and checked the result again. Finally I executed a FLUSH-statement (this is important, because it clears the internal caches of MariaDB).
MariaDB [(none)]> create database TestDBF;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> use TestDBF;
Database changed
MariaDB [TestDBF]> create table TestDbf (                                                                                                                                                                             
    ->                 Id   numeric(8,0) not null,
    ->                 PZN  char(7) not null,
    ->                 EVP  decimal(8,2) not null,
    ->                 HAP  decimal(9,2) not null,
    ->                 ArtikelBez char(40) not null,
    ->                 Datum date not null )
    ->                 engine=DBF
    -> ;
Query OK, 0 rows affected (0.05 sec)

MariaDB [TestDBF]> insert into TestDbf values( 12345678, 1234567, 123456.78, 1234567.89, '1234567890123456789012345678901234567890', '2014-02-04') ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestDBF]> insert into TestDbf values( 23456789, 2345678, 234567.89, 2345678.90, '2345678901234567890123456789012345678901', '2014-02-05') ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestDBF]> insert into TestDbf values( 34567890, 3456789, 345678.90, 3456789.01, '3456789012345678901234567890123456789012', '2014-02-06') ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestDBF]> select * from TestDbf;
+----------+---------+-----------+------------+------------------------------------------+------------+
| Id       | PZN     | EVP       | HAP        | ArtikelBez                               | Datum      |
+----------+---------+-----------+------------+------------------------------------------+------------+
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 |
| 23456789 | 2345678 | 234567.89 | 2345678.90 | 2345678901234567890123456789012345678901 | 2014-02-05 |
| 34567890 | 3456789 | 345678.90 | 3456789.01 | 3456789012345678901234567890123456789012 | 2014-02-06 |
+----------+---------+-----------+------------+------------------------------------------+------------+
3 rows in set (0.01 sec)

MariaDB [TestDBF]> desc TestDbf;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| Id         | decimal(8,0) | NO   |     | NULL    |       |
| PZN        | char(7)      | NO   |     | NULL    |       |
| EVP        | decimal(8,2) | NO   |     | NULL    |       |
| HAP        | decimal(9,2) | NO   |     | NULL    |       |
| ArtikelBez | char(40)     | NO   |     | NULL    |       |
| Datum      | date         | NO   |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
6 rows in set (0.01 sec)

MariaDB [TestDBF]> update TestDbf set HAP = 47.11 where Datum = '2014-02-05';
Query OK, 1 row affected, 3 warnings (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 3

MariaDB [TestDBF]> select * from TestDbf;
+----------+---------+-----------+------------+------------------------------------------+------------+
| Id       | PZN     | EVP       | HAP        | ArtikelBez                               | Datum      |
+----------+---------+-----------+------------+------------------------------------------+------------+
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 |
| 23456789 | 2345678 | 234567.89 |      47.11 | 2345678901234567890123456789012345678901 | 2014-02-05 |
| 34567890 | 3456789 | 345678.90 | 3456789.01 | 3456789012345678901234567890123456789012 | 2014-02-06 |
+----------+---------+-----------+------------+------------------------------------------+------------+
3 rows in set (0.00 sec)

MariaDB [TestDBF]> flush tables TestDbf;
Query OK, 0 rows affected (0.00 sec)

MariaDB [TestDBF]> 

Staying in MariaDB I created another table and inserted 10 Mio. records into this DBF-table. On this table I did some calculations, a little bit more complicated than the statements before. For further comparisons the output was written to a file. The same operation was done on the original table which is of type MyIsam.
MariaDB [TestDBF]> create table TestDbf2 (                                                                                                                                                                             
    ->                 Id   numeric(8,0) not null,
    ->                 PZN  char(7) not null,
    ->                 EVP  decimal(8,2) not null,
    ->                 HAP  decimal(9,2) not null,
    ->                 ArtikelBez char(40) not null,
    ->                 ArtikelText char(26) not null,
    ->                 Hersteller  char(5) not null)
    ->                 engine=DBF
    -> ;
Query OK, 0 rows affected (0.03 sec)

MariaDB [TestDBF]> insert into TestDbf2 select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (3 min 34.64 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [TestDBF]> select Hersteller, count(*) from TestDbf2 group by Hersteller into outfile 'x.x' ;
Query OK, 2581 rows affected (1 min 12.91 sec)

MariaDB [TestDBF]> select Hersteller, count(*) from TestOK.ABDAOK group by Hersteller into outfile 'y.y'; 
Query OK, 2581 rows affected (17.27 sec)

MariaDB [TestDBF]> 

Now let's switch to the OS-level and look into the data-directory:
august@AMD4:~/MariaDB/data/TestDBF$ ls -l
insgesamt 1064532
-rw-rw---- 1 august august         65 Feb 21 11:20 db.opt
-rw-rw---- 1 august august 1090000257 Feb 21 11:29 TestDbf2.DBF
-rw-rw---- 1 august august        665 Feb 21 11:25 TestDbf2.frm
-rw-rw---- 1 august august        483 Feb 21 11:24 TestDbf.DBF
-rw-rw---- 1 august august        603 Feb 21 11:21 TestDbf.frm
-rw-rw-rw- 1 august august      26545 Feb 21 11:31 x.x
-rw-rw-rw- 1 august august      26545 Feb 21 11:32 y.y
august@AMD4:~/MariaDB/data/TestDBF$ cmp x.x y.y
august@AMD4:~/MariaDB/data/TestDBF$ 

A you can see the table TestDbf2 is about 1GB in size. And the output-file generated by the SQL-statements are identical in size. The compariosn-operation in the last line above shows that these files are identical (I marked these in bold).

I copied the two DBF-files to Windoes XP and looked at these tables using the old dBASE-program. Let me begin with the table TestDbf:
                                   Hilfe: F1
            Geben Sie einen dBASE-Befehl ein, und drücken Sie RETURN
. use testdbf
. list stru
Datensatzformat der dB-Datei    : C:\DBASE\TESTDBF.DBF
Anzahl der Datensätze           :       4
Datum der letzten Aktualisierung: 22.02.14
Feld   Feldname    Typ        Länge    Dez    Index
    1  ID          Numerisch      9               N
    2  PZN         Zeichen        7               N
    3  EVP         Numerisch     10      2        N
    4  HAP         Numerisch     11      2        N
    5  ARTIKELBEZ  Zeichen       40               N
    6  DATUM       Datum          8               N
* Gesamt *                       86

. list
Satz#           ID PZN            EVP         HAP ARTIKELBEZ
           DATUM
      1   12345678 1234567  123456,78  1234567,89 123456789012345678901234567890
1234567890 04.02.14
      2   23456789 2345678  234567,89       47,11 234567890123456789012345678901
2345678901 05.02.14
      3   34567890 3456789  345678,90  3456789,01 345678901234567890123456789012
3456789012 06.02.14

.

Compare this to the CREATE-statement and the output of the SELECT-statement some lines above and you see the data after the UPDATE-operation, but in this case presented by dBASE. Next I added another record to this table:


Let's verify our action within dBASE:
. list
Satz#           ID PZN            EVP         HAP ARTIKELBEZ
           DATUM
      1   12345678 1234567  123456,78  1234567,89 123456789012345678901234567890
1234567890 04.02.14
      2   23456789 2345678  234567,89       47,11 234567890123456789012345678901
2345678901 05.02.14
      3   34567890 3456789  345678,90  3456789,01 345678901234567890123456789012
3456789012 06.02.14
      4  112233445 4712345       4,71       12,75 this is a test for dBase
           21.02.14

.

Then I did some more tests within dBASE with the second table.
. use TestDbf2
. list stru
Datensatzformat der dB-Datei    : C:\DBASE\TESTDBF2.DBF
Anzahl der Datensätze           : 10000000
Datum der letzten Aktualisierung: 21.02.14
Feld   Feldname    Typ        Länge    Dez    Index
    1  ID          Numerisch      9               N
    2  PZN         Zeichen        7               N
    3  EVP         Numerisch     10      2        N
    4  HAP         Numerisch     11      2        N
    5  ARTIKELBEZ  Zeichen       40               N
    6  ARTIKELTEX  Zeichen       26               N
    7  HERSTELLER  Zeichen        5               N
* Gesamt *                      109

. count
10000000 Datensätze
. sum EVP, HAP
10000000 Datensätze summiert
          EVP           HAP
 609393106,71  286165123,23
.

OK, let's go back to MariaDB and copy the file TestDbf.DBF to the data-directory of MariaDB. Because I've modified this file I want to look into this file to see what MariaDB tells us about it. Here is the content of the table TestDbf:
MariaDB [TestDBF]> select * from TestDbf;
+----------+---------+-----------+------------+------------------------------------------+------------+
| Id       | PZN     | EVP       | HAP        | ArtikelBez                               | Datum      |
+----------+---------+-----------+------------+------------------------------------------+------------+
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 |
| 23456789 | 2345678 | 234567.89 |      47.11 | 2345678901234567890123456789012345678901 | 2014-02-05 |
| 34567890 | 3456789 | 345678.90 | 3456789.01 | 3456789012345678901234567890123456789012 | 2014-02-06 |
| 99999999 | 4712345 |      4.71 |      12.75 | this is a test for dBase                 | 1914-02-21 |
+----------+---------+-----------+------------+------------------------------------------+------------+
4 rows in set (0.00 sec)

MariaDB [TestDBF]>

Everything looks fine except the last value (therefore I highlighted it). dBASE handles years with only 2 digits except you tell it otherwise, that means I've forgotten to say it: set century on. My fault, that's why dBASE created the entry 14 as 1914 (please look at the picture above). It' an old program, written before the year 2000.

Now we look at the table TestDbf2, let's do the SUM-operation again in MariaDB:
MariaDB [TestDBF]> select sum(EVP), sum(HAP) from TestDbf2 ;
+--------------+--------------+
| sum(EVP)     | sum(HAP)     |
+--------------+--------------+
| 609393106.71 | 286165123.23 |
+--------------+--------------+
1 row in set (1 min 12.10 sec)

MariaDB [TestDBF]>

Identical result, no complains.

remark

Don't use this engine for your everyday work, it's not intended for this purpose. The code does not support indexes nor is it designed for speed - and there is no intention to implement this in the future. It' should only be used for transporting data to/from a system.
An example: I did the test with the GROUP BY-operation. This took about 17 seconds on the MyISAM-table and 72 second on the DBF-table. And I have no intention to speed this up to a comparable value.

some details

Contrary to the TXT-engine I presented before you can do INSERT, SELECT, UPDATE and DELETE-operations with this engine.

And please look into the file ha_dbf.h. There you will find links to detailled descriptions of the structure of a DBF-file.

the code

I want to give some hints to some of the functions of the class of this engine.

create()

This function is called when a CREATE TABLE-statement is executed. At first it checks the column-definitions for validity in the dBASE-environment. Any type that is valid in MariaDB/MySQL but not supported by dBASE is flagged as an error. Then it creates the DBF-file and writes the first part of the header, including some information on length of structures. And finally it iterates on all the columns again and creates and writes the structure for each column to the DBF-file.

open()

This functions allocates the memory for the header-inormation from the DBF-file, opens the file and reads the header into the allocated memory.

copy_fields2buffer()

This is an internal function so it is private to the class. It copies the information from the MyISAM-format (=internal format of the server) to a buffer. This buffer will later be written to the DBF-file on disk.

correct_DBF_header()

After write-operations the header of the DBF-file has to be corrected. This function is private too

Sunday 9 February 2014

TXT

I would like to present a TXT-storage-engine for MySQL and MariaDB.

why

Think of 2 DB-systems, developed independently of each other (e.g. in 2 different companies). Now you get the job to synchronize the data of these 2 systems, which will continue to exist in the future. This means every night extracting the relevant data (not all, only some tables and only the changed records or new or deleted or ...), write it into a file, move this file to another place and insert htis data into the second DB-system (including error-reporting and -handling). And naturally this has to be done in the opposite direction too.

Usually the file for transferring the data is a text-file, so a structure is developed between the two teams, usually in the form of fixed column-width text. Other possible formats would be DIF (long time ago) or CSV, today this would be XML.

And it requires that at least 2 programs have to be developed, running on 2 different computers (maybe different operating systems, different programming languages), doing 4 jobs: reading and writing to/from system 1 and reading and writing to/from system 2.

I've written multiple programs for such jobs in the past, so it's time to think about a better way to handle this.

concept

The idea is to have such a text-file and to use it directly within MySQL and MariaDB without any special programs, just using plain SQL.

This engine is based on the same idea as the CSV-engine, only the file-format changes. And the CSV-engine is already part of MySQL and MariaDB.

restrictions

This code contains no logic in the handling of the data. The code simply takes the data given to a function and writes it to the datafile. Or it reads the data from the datafile and gives it back to the caller in the server-code.
If you have special relations between the data you have to implement it using a script language or a program in a conventional language. I can't handle this here in a storage engine because I don't know of any restrictions on your DB-system.

credits

The code I like to present here is base on the CSV-engine of Sasha Pachev. You will find his book here or here. Thanks to Sasha for the work.

code

You will find the code of this engine here. Handle it as described in the post about the DUMMY-engine.

tests

After compiling and installing let's start with the first test:
MariaDB [(none)]> create database TestTXT;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> use TestTXT;
Database changed
MariaDB [TestTXT]> create table TestTxtl (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  char(7),
    ->                 EVP  char(8),
    ->                 HAP  char(8),
    ->                 ArtikelBez varchar(40),
    ->                 ArtikelText varchar(26),
    ->                 Hersteller  char(5) )
    ->                 engine=TXT
    -> ;
ERROR 1178 (42000): The storage engine for the table doesn't support nullable columns
MariaDB [TestTXT]/gt;

OK, the CREATE-statement allows NULLs but in text-files NULLs are not possible. So let's fix this in the CREATE-statement and try it again:
MariaDB [TestTXT]> create table TestTxt (                                                                                                                                                                             
    ->                 Id   char(8) not null,
    ->                 PZN  char(7) not null,
    ->                 EVP  char(8) not null,
    ->                 HAP  char(8) not null,
    ->                 ArtikelBez varchar(40) not null,
    ->                 ArtikelText varchar(26) not null,
    ->                 Hersteller  char(5) not null)
    ->                 engine=TXT
    -> ;
ERROR 1178 (42000): The storage engine for the table doesn't support column-type VARCHAR
MariaDB [TestTXT]>

OK, OK, server you're right, we can't handle variable length columns, the columns have to be of fixed width. Let's fix this:
MariaDB [TestTXT]> create table TestTxt (                                                                                                                                                                             
    ->                 Id   char(8) not null,
    ->                 PZN  char(7) not null,
    ->                 EVP  char(8) not null,
    ->                 HAP  char(8) not null,
    ->                 ArtikelBez char(40) not null,
    ->                 ArtikelText char(26) not null,
    ->                 Hersteller  char(5) not null)
    ->                 engine=TXT
    -> ;
Query OK, 0 rows affected (0.05 sec)

MariaDB [TestTXT]>

Uff, this works. So let's drop this table immediately because this example uses only char-types. I want to use a more complex example by using different types for the columns.
MariaDB [(none)]> create database TestTXT;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> use TestTXT;
Database changed
MariaDB [TestTXT]> create table TestTxt (                                                                                                                                                                             
    ->                 Id   numeric(8,0)        not null,
    ->                 PZN  char(7)             not null,
    ->                 EVP  decimal(8,2)        not null,
    ->                 HAP  decimal(9,2)        not null,
    ->                 ArtikelBez char(40)      not null,
    ->                 Datum date               not null,
    ->                 Uhrzeit time             not null,
    ->                 DatumZeit  datetime      not null,
    ->                 Tstamp timestamp         not null,
    ->                 PreciseTime datetime(6)  not null)
    ->                 engine=TXT
    -> ;
Query OK, 0 rows affected (0.04 sec)

MariaDB [TestTXT]> insert into TestTxt values( 12345678, 1234567, 123456.78, 1234567.89, '1234567890123456789012345678901234567890', 
    ->     '2014-02-04', '14:59', '2014-02-04 14:59:00', current_timestamp, current_timestamp)
    -> ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestTXT]> insert into TestTxt values( 12345678, 1234567, 123456.78, 1234567.89, '1234567890123456789012345678901234567890', 
    ->     '2014-02-04', '14:59', '2014-02-04 14:59:00', current_timestamp, '2014-02-05 19:51:17.123456')
    -> ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestTXT]> select * from TestTxt;
+----------+---------+-----------+------------+------------------------------------------+------------+----------+---------------------+---------------------+----------------------------+
| Id       | PZN     | EVP       | HAP        | ArtikelBez                               | Datum      | Uhrzeit  | DatumZeit           | Tstamp              | PreciseTime                |
+----------+---------+-----------+------------+------------------------------------------+------------+----------+---------------------+---------------------+----------------------------+
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 | 14:59:00 | 2014-02-04 14:59:00 | 2014-02-05 19:51:23 | 2014-02-05 19:51:23.000000 |
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 | 14:59:00 | 2014-02-04 14:59:00 | 2014-02-05 19:52:17 | 2014-02-05 19:51:17.123456 |
+----------+---------+-----------+------------+------------------------------------------+------------+----------+---------------------+---------------------+----------------------------+
2 rows in set (0.00 sec)

MariaDB [TestTXT]>

Let's look into the datafile:
MariaDB [(none)]> select @@datadir;
+----------------------------+
| @@datadir                  |
+----------------------------+
| /home/august/MariaDB/data/ |
+----------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> 

In the directory given above I found some directories. Among others there is one with the name TestTXT (= the database we created, see above). In this directory I see some files:
august@AMD4:~/MariaDB/data/TestTXT$ ls -l
insgesamt 12
-rw-rw---- 1 august august  65 Jan 31 15:31 db.opt
-rw-rw---- 1 august august 731 Feb  5 19:51 TestTxt.frm
-rw-rw---- 1 august august 324 Feb  5 19:52 TestTxt.txt
august@AMD4:~/MariaDB/data/TestTXT$ 

The file TestTxt.frm contains the internal information about the table TestTxt. TestTxt.txt contains the data, so let's look into it (I've added column-headers to add some clarity, these headers are not in the data-file):
<- Id  -><-PZN-><- EVP  -><-  HAP  -><-          ArtikelBez                -><- Datum-><-Uhrzeit><-    DatumZeit  -><-     Tstamp    -><-    PreciseTime       ->
 123456781234567 123456.78 1234567.8912345678901234567890123456789012345678902014-02-0414:59:00002014-02-04 14:59:002014-02-05 19:51:232014-02-05 19:51:23.000000
 123456781234567 123456.78 1234567.8912345678901234567890123456789012345678902014-02-0414:59:00002014-02-04 14:59:002014-02-05 19:52:172014-02-05 19:51:17.123456

remark

Don't use this engine for your everyday work, it's not intended for this purpose. The code does not support indexes nor is it designed for speed - and there is no intention to implement this in the future. It' should only be used for transporting data to/from a system.


some details

Some hints about SQL-operations follow.

INSERT

Naturally you can insert data into a table of type TXT. Look at the examples above to see this. And also you can do this with statements like:
 INSERT INTO TableDest SELECT * FROM TableSource;

SELECT

Naturally you can read data from a table of type TXT

UPDATE

By intention updating data in a table of type TXT is not supported.

DELETE

By intention deleting data in a table of type TXT is not supported.


the code

I would like to give some descriptions of some of the functionss of this class.

write_row()

This function is called when an INSERT-statement is executed. It does some housekeeping and calls the function write_line().

write_line()

This functions does the work of writing the data at the end of the datafile. It's always writing at the end of the data-file.
The function is given the data in a buffer. The code iterates on all columns of this table and handles the data according to the type of the column. It converts the data from the buffer into plain text and adjusts the length. It simply returns the data in the text-buffer to the caller.

fetch_line()

This function is called from the function rnd_next(), which is the function for handling a table-scan row by row. And a table-scan is the only way to access the data in a table of type TXT.

update_row()

It simply returns with an error-code.

delete_row()

This function too returns with an error-code.

delete_all_rows()

This function is called when a statement TRUNCATE table is executed. It simply calls the OS to change the file-size to 0.

create()

This function checks all the columns given in the CREATE-statement on validity and returns with an error-code if it finds an unsupported type. It finally calls the create-function of MySQL/MariaDB for wrting the frm-file and for creating the txt-file.