Thursday 21 August 2014

encryption

the data of a company is usually stored in databases, your personal data is normally not stored in a database (only some people do this). Protecting the data in a database is essential for companies. You need an argument for this statement? Please allow me to do some 'name-dropping': Edward Snowden showed us interesting things. And companies sometime wonder about competitors and their advantages: espionage.

Some weeks ago I was asked by a company to help in a project. The company that asked is working in the business for arranging contracts between companies and software developers, so getting the right information about the job to do wasn't easy in the first discussion. A week later the job description was clear: implementing an encryption-function in MySQL/MariaDB. By asking Mr. Google I found this site: MariaDB Database Encryption. This describes in detail one approach for use in the engines InnoDB and XtraDB.
There was more information available: Mr. Google also told me this about encryption in MariaDB. I continued searching and started thinking about possible solutions.

I like to present some of my thoughts and experiments. As I said in other posts before: the code I will present here is only a proof of concept, it's not ready for commercial use.

As my focus is on the code of MariaDB/MySQL and not in the algorithm or code of any encryption-procedures I've chosen a simple function for simulating encryption: XOR. It's easy to implement, fast, encrypting and decrypting is identical (so I have only one routine to code) and the encrypted text is broken in some seconds (or less). Let's simply use this algorithm (you shouldn't use it for encrypting your data).

functions

MariaDB/MySQL contains functions for encrypting and decrypting your data. You will find a list of these functions here: 12.13 Encryption and Compression Functions. This page also contains examples on how to use these.

As my "encryption" is XOR I would like to demonstrate this using this approach. So I have to write an UDF which implements this. You can find a description of UDFs here:CREATE FUNCTION UDF and an introduction into the rules for writing your own UDF here: 24.3 Adding New Functions to MySQL. And finally there is an example which you will find in the directory sql of the source-code: udf_example.c.

You will find the source-code of my little UDF in the archive in the file: string_xor.c1). Compile it and copy it to the directory where the database-server expects it:
august@AMD4:~/workspace/MariaDB/mariadb-10.0.10/sql$ gcc -shared -o string_xor.so string_xor.c  -I ../include
august@AMD4:~/workspace/MariaDB/mariadb-10.0.10/sql$ cp string_xor.so ~/MariaDB/pgm/lib/plugin/
august@AMD4:~/workspace/MariaDB/mariadb-10.0.10/sql$ 

So let's test this little routine:
 MariaDB [(none)]> CREATE FUNCTION string_xor RETURNS STRING SONAME 'string_xor.so';
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> select hex(string_xor('ABCDE'));
+--------------------------+
| hex(string_xor('ABCDE')) |
+--------------------------+
| 1417161110               |
+--------------------------+
1 row in set (0.03 sec)

MariaDB [(none)]> select string_xor(unhex(1417161110));
+-------------------------------+
| string_xor(unhex(1417161110)) |
+-------------------------------+
| ABCDE                         |
+-------------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> 

Using the hex()- und unhex()-functions was necessary because the box above does not show correctly the output of the function string_xor(). If I omit the hex()-function it looks like this:


Thius output looks like encrypted. So let's continue our tests:
MariaDB [(none)]> create database TestEnc;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> use TestEnc;
Database changed
MariaDB [TestEnc]> create table TestNormal (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  char(7),
    ->                 EVP  char(8),
    ->                 HAP  char(8),
    ->                 ArtikelBez char(40),
    ->                 ArtikelText char(26),
    ->                 Hersteller  char(5) )
    ->                 engine=MYISAM
    -> ;
Query OK, 0 rows affected (0.05 sec)

MariaDB [TestEnc]> create table TestEnc (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  char(7),
    ->                 EVP  char(8),
    ->                 HAP  char(8),
    ->                 ArtikelBez char(40),
    ->                 ArtikelText char(26),
    ->                 Hersteller  char(5) )
    ->                 engine=MYISAM
    -> ;
Query OK, 0 rows affected (0.18 sec)

MariaDB [TestEnc]> insert into TestNormal values ( '01000000', '999999', '132.36', '0.00', 'UROSAFE BEINBEUTEL 7732C      1  L', 'SPRING DE LU AM K1 NO TOP5', '25677'); 
Query OK, 1 row affected (0.00 sec)

MariaDB [TestEnc]> insert into TestEnc values ( '01000000', '999999', '132.36', '0.00', 'UROSAFE BEINBEUTEL 7732C      1  L', 'SPRING DE LU AM K1 NO TOP5', string_xor('25677'));
Query OK, 1 row affected (0.00 sec)

MariaDB [TestEnc]> 

As you can see in the last lines I've inserted a row into each table. For showing the result of these INSERT-statements I did some SELECTs. In the result I've marked the values for the column Hersteller in bold. So let's look what is stored in the table:
MariaDB [(none)]>MariaDB [TestEnc]> select * from TestNormal;
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| Id       | PZN    | EVP    | HAP  | ArtikelBez                         | ArtikelText                | Hersteller |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C      1  L | SPRING DE LU AM K1 NO TOP5 | 25677      |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
1 row in set (0.00 sec)

MariaDB [TestEnc]> select * from TestEnc;   
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| Id       | PZN    | EVP    | HAP  | ArtikelBez                         | ArtikelText                | Hersteller |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C      1  L | SPRING DE LU AM K1 NO TOP5 | g`cbb      |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
1 row in set (0.00 sec)

MariaDB [TestEnc]> select Id, PZN, EVP, HAP, ArtikelBez, ArtikelText, string_xor(Hersteller) as Hersteller from TestEnc;
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| Id       | PZN    | EVP    | HAP  | ArtikelBez                         | ArtikelText                | Hersteller |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C      1  L | SPRING DE LU AM K1 NO TOP5 | 25677      |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
1 row in set (0.01 sec)

MariaDB [TestEnc]> 
As you can see accessing the column Hersteller of our table TestEnc shows an encrypted value, you must use the decryption-routine to get the real value of this column.

Let's look into the MYD-files of the two tables:
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestNormal.MYD
000000 01                              flag
       30 31 30 30 30 30 30 30         Id
       39 39 39 39 39 39 20            PZN
000010 31 33 32 2e 33 36 20 20         EVP
       30 2e 30 30 20 20 20 20         HAP
000020 55 52 4f 53 41 46 45 20         ArtikelBez
       42 45 49 4e 42 45 55 54
000030 45 4c 20 37 37 33 32 43 
       20 20 20 20 20 20 31 20
000040 20 4c 20 20 20 20 20 20 
       53 50 52 49 4e 47 20 44         ArtikelText
000050 45 20 4c 55 20 41 4d 20 
       4b 31 20 4e 4f 20 54 4f
000060 50 35 
       32 35 36 37 37                  Hersteller
000067

august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestEnc.MYD
000000 01                              flag
       30 31 30 30 30 30 30 30         Id
       39 39 39 39 39 39 20            PZN
000010 31 33 32 2e 33 36 20 20         EVP
       30 2e 30 30 20 20 20 20         HAP
000020 55 52 4f 53 41 46 45 20         ArtikelBez
       42 45 49 4e 42 45 55 54
000030 45 4c 20 37 37 33 32 43 
       20 20 20 20 20 20 31 20
000040 20 4c 20 20 20 20 20 20 
       53 50 52 49 4e 47 20 44         ArtikelText
000050 45 20 4c 55 20 41 4d 20 
       4b 31 20 4e 4f 20 54 4f
000060 50 35 
       67 60 63 62 62                  Hersteller (encrypted)
000067
august@AMD4:~/MariaDB/data/TestEnc$ 

At the end of the tables you see the value of the column Hersteller. For table TestNormal you see it in plain text, for table TestEnc you see the encrypted values. To verify our little UDF let's do some calculations:
       value        decrypt                   encrypt
       67           67 XOR 55 = 32 = '2'      32 XOR 55 = 67
       60           60 XOR 55 = 35 = '5'      35 XOR 55 = 60
       63           63 XOR 55 = 36 = '6'      36 XOR 55 = 63
       62           62 XOR 55 = 37 = '7'      37 XOR 55 = 62

final words: can we use this approach? Maybe. Please keep in mind that every SQL-statement must contain the call of the encryption- or decryption-function for every column that contains encrypted values. In most of the cases this has to be done in the client-software. Using a hex-editor or decompiler (e.g. for Java) or some other tools this can easily be detected. So this approach is no real protection.

trigger

Let's switch over to the server-side and implement encryption via a trigger. You will find a description here: CREATE TRIGGER Statement.

Here is the first step of the example using a trigger:
MariaDB [TestEnc]> create table TestTrigger (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  char(7),
    ->                 EVP  char(8),
    ->                 HAP  char(8),
    ->                 ArtikelBez char(40),
    ->                 ArtikelText char(26),
    ->                 Hersteller  char(5) )
    ->                 engine=MYISAM
    -> ;
Query OK, 0 rows affected (0.04 sec)

MariaDB [TestEnc]> delimiter //    
MariaDB [TestEnc]> CREATE TRIGGER test_xor BEFORE INSERT ON TestTrigger
    ->     FOR EACH ROW
    ->     BEGIN
    ->         SET NEW.Hersteller = string_xor( NEW.Hersteller);
    ->     END;//
Query OK, 0 rows affected (0.08 sec)

MariaDB [TestEnc]> delimiter ;
MariaDB [TestEnc]> insert into TestTrigger values ( '01000000', '999999', '132.36', '0.00', 'UROSAFE BEINBEUTEL 7732C      1  L', 'SPRING DE LU AM K1 NO TOP5', '25677');
Query OK, 1 row affected (0.00 sec)

MariaDB [TestEnc]> select * from TestTrigger;
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| Id       | PZN    | EVP    | HAP  | ArtikelBez                         | ArtikelText                | Hersteller |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C      1  L | SPRING DE LU AM K1 NO TOP5 | g`cbb      |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
1 row in set (0.00 sec)

MariaDB [TestEnc]> 

If you look at the result of the SELECT you see that only the column Hersteller is encrypted. And if you look at the definition of the trigger you see that it uses the UDF string_xor() defined before and it calls this function only for the column Hersteller.

Looks fine but let's look into the MYD-file:
august@AMD4:~/MariaDB/data/TestEnc$ l -l TestTrigger*
-rw-rw---- 1 august august  679 Aug 19 13:57 TestTrigger.frm
-rw-rw---- 1 august august  103 Aug 19 13:58 TestTrigger.MYD
-rw-rw---- 1 august august 1024 Aug 19 13:58 TestTrigger.MYI
-rw-rw---- 1 august august  337 Aug 19 13:57 TestTrigger.TRG
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestTrigger.MYD
000000 01 30 31 30 30 30 30 30 30 39 39 39 39 39 39 20
000010 31 33 32 2e 33 36 20 20 30 2e 30 30 20 20 20 20
000020 55 52 4f 53 41 46 45 20 42 45 49 4e 42 45 55 54
000030 45 4c 20 37 37 33 32 43 20 20 20 20 20 20 31 20
000040 20 4c 20 20 20 20 20 20 53 50 52 49 4e 47 20 44
000050 45 20 4c 55 20 41 4d 20 4b 31 20 4e 4f 20 54 4f
000060 50 35 67 60 63 62 62
000067
august@AMD4:~/MariaDB/data/TestEnc$ 

If you look at the hex-dump you will see exactly the same hex-values as in the example above (table TestEnc.MYD) so this approach worked.

How can we access this column with the encrypted value? Unfortunately there is no trigger for SELECT in MariaDB/MySQL. Other database-software also does not have such a trigger as you can see here for Oracle and Microsoft. So we have to create a stored procedure that contains this SELECT-statement and does the processing on the encrypted columns, returning the row after processing it.2). But I don't want to do this here but instead switch over to a different approach.

continued

Next I like to describe an approach that uses a storage engine so this post will be continued.


some notes:
1) I will give you the link to the archive in the coming post.
2) for protecting the data in your database this approach has some advantages: you can put all SQL-statements in stored-procedures and the clients are only allowed to call these stored-procedures. You can even let these stored procedures call other stored procedures which belong to a different user but have the right to access tables and manipulate data. So from the client side you will only see a stored procedure which contains a one-liner (the call of another stored procedure, which is invisible to the client).