Sunday 23 February 2014

dBASE

I would like to present a tiny storage-engine for MySQL and MariaDB that stores the data in the dBASE-format. This format and the software around it were quite popular a long time ago.

You can find a detailed description of dBASE in the Wikipedia.

why

A lot of software can read and write dBASE-files directly. So you handle your data in MySQL or MariaDB and (as an example) want to write some standard-letters using MS Word or LibreOffice. One solution could be to extract the data from the tables in MariaDB or MySQL and put them into a DBF-file. This file can then be accessed from your favorite word-processing-software.
Other applications are possible. In a former project we had to give some data to a call-center-company for telephone-calls for reminding people of some action. This was done in a similar way.

concept

The idea for this engine is the same as for the TXT-engine I presented earlier.
You can simply extract the data from an existing application, which writes the data in the DBF-format. Transfer this file to MySQL/MariaDB and read the data from the DBF-file into your preferred format for further processing.
OR:
Handle your data in MariaDB/MySQL. Extract the data from these tables and write it into one or more tables of type DBF. Transfer these files to the other application that can read and handle DBF-files.

hint

You do have to create your DBF-tables in MariaDB/MySQL before you can work with them, this step tells MySQL/MariaDB about these tables and also creates a DBF-file with a structure dBASE can understand. Later you can replace the DBF-file in the data-directory of MariaDB/MySQL with another DBF-file of identical structure but different content. If you replace it with a DBF-file of a different structure MySQL/MariaDB cannot handle this.

compatibility

I created some tables within MariaDB and tested the DBF-files with an old copy of dBASE IV. I will show you the results. No problems occured.

restrictions

The purpose of this engine is not to use it as a production engine. The intention is for making the transfer of data between two systems easier.
So this engine does not support indexes; it does not read or write any NDX, MNDX, NTX and so on files. And I do not intent to add this feature in the future.

code

You will find the code of this engine here. Handle it as described in the post about the DUMMY-engine.

tests

I did some tests by reading from and writing to such a table in the DBF-format. Then I transferred the tables to Windows XP. For this OS I have an old copy of dBASE IV (running in a DOS-box on XP) and with this software I tested my DBF-files and modified some data. After transferring the DBF-files back to MariaDB I looked at these table again.

The tests step by step: in MariaDB I created a database and in this I created a table, inserted some records into this table, checked the result via SELECT, did an UPDATE on one record and checked the result again. Finally I executed a FLUSH-statement (this is important, because it clears the internal caches of MariaDB).
MariaDB [(none)]> create database TestDBF;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> use TestDBF;
Database changed
MariaDB [TestDBF]> create table TestDbf (                                                                                                                                                                             
    ->                 Id   numeric(8,0) not null,
    ->                 PZN  char(7) not null,
    ->                 EVP  decimal(8,2) not null,
    ->                 HAP  decimal(9,2) not null,
    ->                 ArtikelBez char(40) not null,
    ->                 Datum date not null )
    ->                 engine=DBF
    -> ;
Query OK, 0 rows affected (0.05 sec)

MariaDB [TestDBF]> insert into TestDbf values( 12345678, 1234567, 123456.78, 1234567.89, '1234567890123456789012345678901234567890', '2014-02-04') ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestDBF]> insert into TestDbf values( 23456789, 2345678, 234567.89, 2345678.90, '2345678901234567890123456789012345678901', '2014-02-05') ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestDBF]> insert into TestDbf values( 34567890, 3456789, 345678.90, 3456789.01, '3456789012345678901234567890123456789012', '2014-02-06') ;
Query OK, 1 row affected (0.00 sec)

MariaDB [TestDBF]> select * from TestDbf;
+----------+---------+-----------+------------+------------------------------------------+------------+
| Id       | PZN     | EVP       | HAP        | ArtikelBez                               | Datum      |
+----------+---------+-----------+------------+------------------------------------------+------------+
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 |
| 23456789 | 2345678 | 234567.89 | 2345678.90 | 2345678901234567890123456789012345678901 | 2014-02-05 |
| 34567890 | 3456789 | 345678.90 | 3456789.01 | 3456789012345678901234567890123456789012 | 2014-02-06 |
+----------+---------+-----------+------------+------------------------------------------+------------+
3 rows in set (0.01 sec)

MariaDB [TestDBF]> desc TestDbf;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| Id         | decimal(8,0) | NO   |     | NULL    |       |
| PZN        | char(7)      | NO   |     | NULL    |       |
| EVP        | decimal(8,2) | NO   |     | NULL    |       |
| HAP        | decimal(9,2) | NO   |     | NULL    |       |
| ArtikelBez | char(40)     | NO   |     | NULL    |       |
| Datum      | date         | NO   |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
6 rows in set (0.01 sec)

MariaDB [TestDBF]> update TestDbf set HAP = 47.11 where Datum = '2014-02-05';
Query OK, 1 row affected, 3 warnings (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 3

MariaDB [TestDBF]> select * from TestDbf;
+----------+---------+-----------+------------+------------------------------------------+------------+
| Id       | PZN     | EVP       | HAP        | ArtikelBez                               | Datum      |
+----------+---------+-----------+------------+------------------------------------------+------------+
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 |
| 23456789 | 2345678 | 234567.89 |      47.11 | 2345678901234567890123456789012345678901 | 2014-02-05 |
| 34567890 | 3456789 | 345678.90 | 3456789.01 | 3456789012345678901234567890123456789012 | 2014-02-06 |
+----------+---------+-----------+------------+------------------------------------------+------------+
3 rows in set (0.00 sec)

MariaDB [TestDBF]> flush tables TestDbf;
Query OK, 0 rows affected (0.00 sec)

MariaDB [TestDBF]> 

Staying in MariaDB I created another table and inserted 10 Mio. records into this DBF-table. On this table I did some calculations, a little bit more complicated than the statements before. For further comparisons the output was written to a file. The same operation was done on the original table which is of type MyIsam.
MariaDB [TestDBF]> create table TestDbf2 (                                                                                                                                                                             
    ->                 Id   numeric(8,0) not null,
    ->                 PZN  char(7) not null,
    ->                 EVP  decimal(8,2) not null,
    ->                 HAP  decimal(9,2) not null,
    ->                 ArtikelBez char(40) not null,
    ->                 ArtikelText char(26) not null,
    ->                 Hersteller  char(5) not null)
    ->                 engine=DBF
    -> ;
Query OK, 0 rows affected (0.03 sec)

MariaDB [TestDBF]> insert into TestDbf2 select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (3 min 34.64 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

MariaDB [TestDBF]> select Hersteller, count(*) from TestDbf2 group by Hersteller into outfile 'x.x' ;
Query OK, 2581 rows affected (1 min 12.91 sec)

MariaDB [TestDBF]> select Hersteller, count(*) from TestOK.ABDAOK group by Hersteller into outfile 'y.y'; 
Query OK, 2581 rows affected (17.27 sec)

MariaDB [TestDBF]> 

Now let's switch to the OS-level and look into the data-directory:
august@AMD4:~/MariaDB/data/TestDBF$ ls -l
insgesamt 1064532
-rw-rw---- 1 august august         65 Feb 21 11:20 db.opt
-rw-rw---- 1 august august 1090000257 Feb 21 11:29 TestDbf2.DBF
-rw-rw---- 1 august august        665 Feb 21 11:25 TestDbf2.frm
-rw-rw---- 1 august august        483 Feb 21 11:24 TestDbf.DBF
-rw-rw---- 1 august august        603 Feb 21 11:21 TestDbf.frm
-rw-rw-rw- 1 august august      26545 Feb 21 11:31 x.x
-rw-rw-rw- 1 august august      26545 Feb 21 11:32 y.y
august@AMD4:~/MariaDB/data/TestDBF$ cmp x.x y.y
august@AMD4:~/MariaDB/data/TestDBF$ 

A you can see the table TestDbf2 is about 1GB in size. And the output-file generated by the SQL-statements are identical in size. The compariosn-operation in the last line above shows that these files are identical (I marked these in bold).

I copied the two DBF-files to Windoes XP and looked at these tables using the old dBASE-program. Let me begin with the table TestDbf:
                                   Hilfe: F1
            Geben Sie einen dBASE-Befehl ein, und drücken Sie RETURN
. use testdbf
. list stru
Datensatzformat der dB-Datei    : C:\DBASE\TESTDBF.DBF
Anzahl der Datensätze           :       4
Datum der letzten Aktualisierung: 22.02.14
Feld   Feldname    Typ        Länge    Dez    Index
    1  ID          Numerisch      9               N
    2  PZN         Zeichen        7               N
    3  EVP         Numerisch     10      2        N
    4  HAP         Numerisch     11      2        N
    5  ARTIKELBEZ  Zeichen       40               N
    6  DATUM       Datum          8               N
* Gesamt *                       86

. list
Satz#           ID PZN            EVP         HAP ARTIKELBEZ
           DATUM
      1   12345678 1234567  123456,78  1234567,89 123456789012345678901234567890
1234567890 04.02.14
      2   23456789 2345678  234567,89       47,11 234567890123456789012345678901
2345678901 05.02.14
      3   34567890 3456789  345678,90  3456789,01 345678901234567890123456789012
3456789012 06.02.14

.

Compare this to the CREATE-statement and the output of the SELECT-statement some lines above and you see the data after the UPDATE-operation, but in this case presented by dBASE. Next I added another record to this table:


Let's verify our action within dBASE:
. list
Satz#           ID PZN            EVP         HAP ARTIKELBEZ
           DATUM
      1   12345678 1234567  123456,78  1234567,89 123456789012345678901234567890
1234567890 04.02.14
      2   23456789 2345678  234567,89       47,11 234567890123456789012345678901
2345678901 05.02.14
      3   34567890 3456789  345678,90  3456789,01 345678901234567890123456789012
3456789012 06.02.14
      4  112233445 4712345       4,71       12,75 this is a test for dBase
           21.02.14

.

Then I did some more tests within dBASE with the second table.
. use TestDbf2
. list stru
Datensatzformat der dB-Datei    : C:\DBASE\TESTDBF2.DBF
Anzahl der Datensätze           : 10000000
Datum der letzten Aktualisierung: 21.02.14
Feld   Feldname    Typ        Länge    Dez    Index
    1  ID          Numerisch      9               N
    2  PZN         Zeichen        7               N
    3  EVP         Numerisch     10      2        N
    4  HAP         Numerisch     11      2        N
    5  ARTIKELBEZ  Zeichen       40               N
    6  ARTIKELTEX  Zeichen       26               N
    7  HERSTELLER  Zeichen        5               N
* Gesamt *                      109

. count
10000000 Datensätze
. sum EVP, HAP
10000000 Datensätze summiert
          EVP           HAP
 609393106,71  286165123,23
.

OK, let's go back to MariaDB and copy the file TestDbf.DBF to the data-directory of MariaDB. Because I've modified this file I want to look into this file to see what MariaDB tells us about it. Here is the content of the table TestDbf:
MariaDB [TestDBF]> select * from TestDbf;
+----------+---------+-----------+------------+------------------------------------------+------------+
| Id       | PZN     | EVP       | HAP        | ArtikelBez                               | Datum      |
+----------+---------+-----------+------------+------------------------------------------+------------+
| 12345678 | 1234567 | 123456.78 | 1234567.89 | 1234567890123456789012345678901234567890 | 2014-02-04 |
| 23456789 | 2345678 | 234567.89 |      47.11 | 2345678901234567890123456789012345678901 | 2014-02-05 |
| 34567890 | 3456789 | 345678.90 | 3456789.01 | 3456789012345678901234567890123456789012 | 2014-02-06 |
| 99999999 | 4712345 |      4.71 |      12.75 | this is a test for dBase                 | 1914-02-21 |
+----------+---------+-----------+------------+------------------------------------------+------------+
4 rows in set (0.00 sec)

MariaDB [TestDBF]>

Everything looks fine except the last value (therefore I highlighted it). dBASE handles years with only 2 digits except you tell it otherwise, that means I've forgotten to say it: set century on. My fault, that's why dBASE created the entry 14 as 1914 (please look at the picture above). It' an old program, written before the year 2000.

Now we look at the table TestDbf2, let's do the SUM-operation again in MariaDB:
MariaDB [TestDBF]> select sum(EVP), sum(HAP) from TestDbf2 ;
+--------------+--------------+
| sum(EVP)     | sum(HAP)     |
+--------------+--------------+
| 609393106.71 | 286165123.23 |
+--------------+--------------+
1 row in set (1 min 12.10 sec)

MariaDB [TestDBF]>

Identical result, no complains.

remark

Don't use this engine for your everyday work, it's not intended for this purpose. The code does not support indexes nor is it designed for speed - and there is no intention to implement this in the future. It' should only be used for transporting data to/from a system.
An example: I did the test with the GROUP BY-operation. This took about 17 seconds on the MyISAM-table and 72 second on the DBF-table. And I have no intention to speed this up to a comparable value.

some details

Contrary to the TXT-engine I presented before you can do INSERT, SELECT, UPDATE and DELETE-operations with this engine.

And please look into the file ha_dbf.h. There you will find links to detailled descriptions of the structure of a DBF-file.

the code

I want to give some hints to some of the functions of the class of this engine.

create()

This function is called when a CREATE TABLE-statement is executed. At first it checks the column-definitions for validity in the dBASE-environment. Any type that is valid in MariaDB/MySQL but not supported by dBASE is flagged as an error. Then it creates the DBF-file and writes the first part of the header, including some information on length of structures. And finally it iterates on all the columns again and creates and writes the structure for each column to the DBF-file.

open()

This functions allocates the memory for the header-inormation from the DBF-file, opens the file and reads the header into the allocated memory.

copy_fields2buffer()

This is an internal function so it is private to the class. It copies the information from the MyISAM-format (=internal format of the server) to a buffer. This buffer will later be written to the DBF-file on disk.

correct_DBF_header()

After write-operations the header of the DBF-file has to be corrected. This function is private too