this is the second part of my posting about encrypting data in
MariaDB/
MySQL. You will find the first part
here.
In this post I like to demonstrate an approach using a storage-engine.
MINTEMPL
Basis of my code is the engine
MINTEMPL which I will extend a bit. You can download the original files from
here. As this code is derived from
MyISAM so my modified
MINTEMPL-engine will also create the standard
MYD and
MYI-files.
You can copy and paste the code presented in the next lines and insert it into the proper places of the original code of
MINTEMPL or use this
archive, which contains all the code.
what to add: if you use the original code of
MINTEMPL you need to modify 3 files of the
MINTEMPL-source. Let me start with the header
ha_mintempl.h. Please add these line to this file:
public:
int write_row(uchar * buf);
int update_row(const uchar * old_data, uchar * new_data);
int rnd_next( uchar *);
int index_first(uchar *);
int index_next(uchar *);
private:
void encrypt_full_row( void );
void decrypt_full_row( void );
String * do_encrypt( String *);
String * do_decrypt( String *);
Next in the source-file
ha_mintempl.cc please add these lines:
int ha_mintempl::write_row(uchar * buf)
{
DBUG_ENTER("ha_mintempl::write_row");
my_bitmap_map *old_sets[2];
encrypt_full_row();
DBUG_RETURN( ha_myisam::write_row(buf) );
}
int ha_mintempl::rnd_next(uchar *buf)
{
DBUG_ENTER("ha_mintempl::rnd_next");
int result = ha_myisam::rnd_next(buf);
decrypt_full_row();
DBUG_RETURN( result );
}
int ha_mintempl::update_row(const uchar *old_data, uchar *new_data)
{
DBUG_ENTER("ha_mintempl::update_row");
encrypt_full_row();
DBUG_RETURN( ha_myisam::update_row(old_data, new_data ) );
}
int ha_mintempl::index_first(uchar *buf)
{
DBUG_ENTER("ha_mintempl::index_first");
int retCode = ha_myisam::index_first( buf );
decrypt_full_row();
DBUG_RETURN( retCode );
}
int ha_mintempl::index_next(uchar *buf)
{
DBUG_ENTER("ha_mintempl::index_next");
int retCode = ha_myisam::index_next( buf );
decrypt_full_row();
DBUG_RETURN( retCode );
}
void ha_mintempl::encrypt_full_row( void )
{
my_bitmap_map *old_sets[2];
dbug_tmp_use_all_columns(table, old_sets, table->read_set, table->write_set);
for ( Field **ptr2Col = (Field**) table->field; *ptr2Col ; ptr2Col++)
{
if ( !((*ptr2Col)->is_null()) )
{
String oldData; // = new String();
(*ptr2Col)->val_str(&oldData, &oldData); // first argument is marked as: __attribute__((unused))
String * newData = do_encrypt( &oldData );
(*ptr2Col)->store( newData->ptr(), newData->length(), system_charset_info);
}
}
dbug_tmp_restore_column_maps(table->read_set, table->write_set, old_sets);
return;
}
void ha_mintempl::decrypt_full_row( void )
{
my_bitmap_map *old_sets[2];
dbug_tmp_use_all_columns(table, old_sets, table->read_set, table->write_set);
for ( Field **ptr2Col = (Field**) table->field; *ptr2Col ; ptr2Col++)
{
if ( !((*ptr2Col)->is_null()) )
{
String oldData; // = new String();
(*ptr2Col)->val_str(&oldData, &oldData); // first argument is marked as: __attribute__((unused))
String * newData = do_decrypt( &oldData );
(*ptr2Col)->store( newData->ptr(), newData->length(), system_charset_info);
}
}
dbug_tmp_restore_column_maps(table->read_set, table->write_set, old_sets);
return;
}
/*
* our simple and silly 'encryptiont'-routine
*/
String *ha_mintempl::do_encrypt (String *data)
{
// do an XOR with the char 0x55
char *tempPtr = (char *)data->ptr();
for ( int i = 0; i < (int) data->length(); i++)
{
tempPtr[i] ^= 0x55;
}
return data;
}
/*
* our simple and silly 'decryptiont'-routine
*/
String *ha_mintempl::do_decrypt (String *data)
{
// do an XOR with the char 0x55
char *tempPtr = (char *)data->ptr();
for ( int i = 0; i < data->length(); i++)
{
tempPtr[i] ^= 0x55;
}
return data;
}
As we will (later) play with indexes we have to link our engine statically to the database server. So please modify the file
CMakeLists.txt to this content:
# for dynamical linking:
# SET(MINTEMPL_PLUGIN_DYNAMIC "ha_mintempl")
# SET(MINTEMPL_SOURCES ha_mintempl.cc ha_mintempl.h)
# MYSQL_ADD_PLUGIN(mintempl ${MINTEMPL_SOURCES} STORAGE_ENGINE MODULE_ONLY)
# for static linking:
SET(MINTEMPL_SOURCES ha_mintempl.cc ha_mintempl)
MYSQL_ADD_PLUGIN(mintempl ${MINTEMPL_SOURCES}
STORAGE_ENGINE
MANDATORY )
TARGET_LINK_LIBRARIES(mintempl)
Compile everything, start the new database server and we are ready for some tests.
With this code presented above I will encrypt the full row, not only a single column. It's easy to implement this for a single column but I want to show how to encrypt a full row. So let's start with the tests: I will create 2 tables, one unencrypted table (
TestNormalFull) for comparison and one encrypted table (
TestEncFull), insert a line into each table and then read from each table:
MariaDB [TestEnc]> create table TestNormalFull (
-> Id char(8),
-> PZN char(7),
-> EVP char(8),
-> HAP char(8),
-> ArtikelBez char(40),
-> ArtikelText char(26),
-> Hersteller char(5) )
-> engine=MYISAM
-> ;
Query OK, 0 rows affected (0.06 sec)
MariaDB [TestEnc]> create table TestEncFull (
-> Id char(8),
-> PZN char(7),
-> EVP char(8),
-> HAP char(8),
-> ArtikelBez char(40),
-> ArtikelText char(26),
-> Hersteller char(5) )
-> engine=MINTEMPL
-> ;
Query OK, 0 rows affected (0.05 sec)
MariaDB [TestEnc]> insert into TestNormalFull select * from TestOK.ABDAOK limit 1;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
MariaDB [TestEnc]> insert into TestEncFull select * from TestOK.ABDAOK limit 1;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
MariaDB [TestEnc]> select * from TestNormalFull;
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| Id | PZN | EVP | HAP | ArtikelBez | ArtikelText | Hersteller |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C 1 L | SPRING DE LU AM K1 NO TOP5 | 25677 |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
1 row in set (0.00 sec)
MariaDB [TestEnc]> select * from TestEncFull;
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| Id | PZN | EVP | HAP | ArtikelBez | ArtikelText | Hersteller |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C 1 L | SPRING DE LU AM K1 NO TOP5 | 25677 |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+
1 row in set (0.00 sec)
MariaDB [TestEnc]>
As you can see the
SQL-statements contain nothing special for encrypting or decrypting values. And the result of the 2 queries is identical. Let's look at the
MYD-files to verify that the data in the table
TestEncFull.MYD is indeed encrypted:
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestNormalFull.MYD
000000 01 flag
30 31 30 30 30 30 30 30 Id
000000 01 flag
30 31 30 30 30 30 30 30 Id
39 39 39 39 39 39 20 PZN
000010 31 33 32 2e 33 36 20 20 EVP
30 2e 30 30 20 20 20 20 HAP
000020 55 52 4f 53 41 46 45 20 ArtikelBez
42 45 49 4e 42 45 55 54
000030 45 4c 20 37 37 33 32 43
20 20 20 20 20 20 31 20
000040 20 4c 20 20 20 20 20 20
53 50 52 49 4e 47 20 44 ArtikelText
000050 45 20 4c 55 20 41 4d 20
4b 31 20 4e 4f 20 54 4f
000060 50 35
32 35 36 37 37 Hersteller
000067
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestEncFull.MYD
000000 01 flag
65 64 65 65 65 65 65 65 Id
6c 6c 6c 6c 6c 6c 20 PZN
000010 64 66 67 7b 66 63 20 20 EVP
65 7b 65 65 20 20 20 20 HAP
000020 00 07 1a 06 14 13 10 75 ArtikelBez
17 10 1c 1b 17 10 00 01
000030 10 19 75 62 62 66 67 16
75 75 75 75 75 75 64 75
000040 75 19 20 20 20 20 20 20
06 05 07 1c 1b 12 75 11 ArtikelText
000050 10 75 19 00 75 14 18 75
1e 64 75 1b 1a 75 01 1a
000060 05 60
67 60 63 62 62 Hersteller
000067
august@AMD4:~/MariaDB/data/TestEnc$
You see the data in
TetEncFull.MYD everything is "encrypted". If you compare the values of the column
Hersteller with the values presented in the
first post you see that they are identical. The only thing not encrypted in the file
TestEncFull.MYD are the space-characters added by the server but this can also be done within the code with slight modifications (untested).
dynamic: let's repeat this test, but this time with tables of dynamic type:
MariaDB [TestEnc]> create table TestNormalDyn (
-> Id char(8),
-> PZN char(7),
-> EVP char(8),
-> HAP char(8),
-> ArtikelBez varchar(40),
-> ArtikelText varchar(26),
-> Hersteller char(5) )
-> engine=MYISAM
-> ;
Query OK, 0 rows affected (0.05 sec)
MariaDB [TestEnc]> create table TestEncDyn (
-> Id char(8),
-> PZN char(7),
-> EVP char(8),
-> HAP char(8),
-> ArtikelBez varchar(40),
-> ArtikelText varchar(26),
-> Hersteller char(5) )
-> engine=MINTEMPL
-> ;
Query OK, 0 rows affected (0.04 sec)
MariaDB [TestEnc]> insert into TestNormalDyn select * from TestOK.ABDAOK limit 2;
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
MariaDB [TestEnc]> insert into TestEncDyn select * from TestOK.ABDAOK limit 2;
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
MariaDB [TestEnc]> select * from TestNormalDyn;
+----------+--------+--------+-------+------------------------------------+----------------------------+------------+
| Id | PZN | EVP | HAP | ArtikelBez | ArtikelText | Hersteller |
+----------+--------+--------+-------+------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C 1 L | SPRING DE LU AM K1 NO TOP5 | 25677 |
| 01000001 | 535818 | 0.00 | 18.91 | PK FUER TIERE 250 ML | URETERSCHIENEAC4P66 | 36367 |
+----------+--------+--------+-------+------------------------------------+----------------------------+------------+
2 rows in set (0.00 sec)
MariaDB [TestEnc]> select * from TestEncDyn;
+----------+--------+--------+-------+------------------------------------+----------------------------+------------+
| Id | PZN | EVP | HAP | ArtikelBez | ArtikelText | Hersteller |
+----------+--------+--------+-------+------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C 1 L | SPRING DE LU AM K1 NO TOP5 | 25677 |
| 01000001 | 535818 | 0.00 | 18.91 | PK FUER TIERE 250 ML | URETERSCHIENEAC4P66 | 36367 |
+----------+--------+--------+-------+------------------------------------+----------------------------+------------+
2 rows in set (0.00 sec)
MariaDB [TestEnc]>
And here is how the
MYD-files look in hex:
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestNormalDyn.MYD
000000 03 00 60 00 block-header for record #1
0c 80
30 31 30 30 30 30 30 30 Id
39 39 39 39 39 39 20 PZN
06 31 33 32 2e 33 36 EVP
04 30 2e 30 30 HAP
22 55 52 4f 53 41 46 45 ArtikelBez (len + text)
20 42 45 49 4e 42 45 55
54 45 4c 20 37 37 33 32
43 20 20 20 20 20 20 31
20 20 4c
1a 53 50 52 49 4e 47 20 ArtikelText (len + text)
44 45 20 4c 55 20 41 4d
20 4b 31 20 4e 4f 20 54
4f 50 35
32 35 36 37 37 Hersteller
01 00 4d block-header for record #2
0c 80
30 31 30 30 30 30 30 31 Id
35 33 35 38 31 38 20 PZN
04 30 2e 30 30 EVP
05 31 38 2e 39 31 HAP
17 50 4b 20 46 55 45 52 ArtikelBez (len + text)
20 54 49 45 52 45 20 20
20 20 32 35 30 20 4d 4c
13 55 52 45 54 45 52 53 ArtikelText (len + text)
43 48 49 45 4e 45 41 43
34 50 36 36
33 36 33 36 37 Hersteller
0000b4
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestEncDyn.MYD
000000 03 00 60 00 block-header for record #1
0c 80
65 64 65 65 65 65 65 65 Id
6c 6c 6c 6c 6c 6c 20 PZN
06 64 66 67 7b 66 63 EVP (len + text)
04 65 7b 65 65 HA (len + text)
22 00 07 1a 06 14 13 10 ArtikelBez ( len + text)
75 17 10 1c 1b 17 10 00
01 10 19 75 62 62 66 67
16 75 75 75 75 75 75 64
75 75 19
1a 06 05 07 1c 1b 12 75 ArtikelText (len + text)
11 10 75 19 00 75 14 18
75 1e 64 75 1b 1a 75 01
1a 05 60
67 60 63 62 62 Hersteller
01 00 4d block-header for record #2
0c 80
65 64 65 65 65 65 65 64 Id
60 66 60 6d 64 6d 20 PZN
04 65 7b 65 65 EVP (len + text)
05 64 6d 7b 6c 64 HAP (len + text)
17 05 1e 75 13 00 10 07 ArtikelBez (len + text)
75 01 1c 10 07 10 75 75
75 75 67 60 65 75 18 19
13 00 07 10 01 10 07 06 ArtikelText (len + text)
16 1d 1c 10 1b 10 14 16
61 05 63 63
66 63 66 63 62 Hersteller
0000b4
august@AMD4:~/MariaDB/data/TestEnc$
UPDATE: we did an
INSERT and a
SELECT, but what about
UPDATE? This is a bit more complicated in the case of a dynamic table but only internally. So let's test this and especially make one column a bit longer:
MariaDB [TestEnc]> update TestNormalDyn set ArtikelBez = concat(ArtikelBez, '-', 'ABCDEFGHIJK');
Query OK, 2 rows affected, 1 warning (0.00 sec)
Rows matched: 2 Changed: 2 Warnings: 1
MariaDB [TestEnc]> update TestEncDyn set ArtikelBez = concat(ArtikelBez, '-', 'ABCDEFGHIJK');
Query OK, 2 rows affected, 1 warning (0.00 sec)
Rows matched: 2 Changed: 2 Warnings: 1
MariaDB [TestEnc]> select * from TestNormalDyn;
+----------+--------+--------+-------+------------------------------------------+----------------------------+------------+
| Id | PZN | EVP | HAP | ArtikelBez | ArtikelText | Hersteller |
+----------+--------+--------+-------+------------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C 1 L-ABCDE | SPRING DE LU AM K1 NO TOP5 | 25677 |
| 01000001 | 535818 | 0.00 | 18.91 | PK FUER TIERE 250 ML-ABCDEFGHIJK | URETERSCHIENEAC4P66 | 36367 |
+----------+--------+--------+-------+------------------------------------------+----------------------------+------------+
2 rows in set (0.00 sec)
MariaDB [TestEnc]> select * from TestEncDyn;
+----------+--------+--------+-------+------------------------------------------+----------------------------+------------+
| Id | PZN | EVP | HAP | ArtikelBez | ArtikelText | Hersteller |
+----------+--------+--------+-------+------------------------------------------+----------------------------+------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C 1 L-ABCDE | SPRING DE LU AM K1 NO TOP5 | 25677 |
| 01000001 | 535818 | 0.00 | 18.91 | PK FUER TIERE 250 ML-ABCDEFGHIJK | URETERSCHIENEAC4P66 | 36367 |
+----------+--------+--------+-------+------------------------------------------+----------------------------+------------+
2 rows in set (0.00 sec)
MariaDB [TestEnc]>
Looks OK, but let's dive into the hex-values. With the help of
the post od and
this information we can see:
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestNormalDyn.MYD
000000 05 block-type (first part of record #1)
00 66 len of record
00 57 len of this block
00 00 00 00 00 00 00 b4 ptr. to next block of this record
0c 80 flags
30 31 30 30 30 30 30 30 Id
39 39 39 39 39 39 20 PZN
06 31 33 32 2e 33 36 EVP (len + text)
04 30 2e 30 30 HAP (len + text)
28 55 52 4f 53 41 46 45 ArtikelBez (len + text)
20 42 45 49 4e 42 45 55
54 45 4c 20 37 37 33 32
43 20 20 20 20 20 20 31
20 20 4c 2d 41 42 43 44
45
1a 53 50 52 49 4e 47 20 ArtikelText (len + text))
44 45 20 4c 55 20 41 4d
20 block ends here
05 00 59 00 43 00 00 00 record #2 (first part of this record)
00 00 00 00 c8
0c 80 flags
30 31 30 30 30 30 30 31 Id
35 33 35 38 31 38 20 PZN
04 30 2e 30 30 EVP (len + text)
05 31 38 2e 39 31 HAP (len + text)
23 50 4b 20 46 55 45 52 ArtikelBez (len + text)
20 54 49 45 52 45 20 20
20 20 32 35 30 20 4d 4c
2d 41 42 43 44 45 46 47
48 49 4a 4b
13 55 52 ArtikelText (len + text)
block ends here
0000b4 09 block-type (final part of record #1)
00 0f len of data
01 no. of filler-bytes at the end of this block
4b 31 20 4e 4f 20 54 4f ArtikelText continued
50 35
32 35 36 37 37 Hersteller
00 filler-byte
09 00 16 02 final part of record #2
45 54 45 52 53 43 48 49 ArtikelText continued
45 4e 45 41 43 34 50 36
36
33 36 33 36 37 Hersteller
00 00 filler-bytes
end of block, end of record
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestEncDyn.MYD
000000 05 00 66 00 57 00 00 00 record #1 (first part of this record)
00 00 00 00 b4
0c 80 flags
65 64 65 65 65 65 65 65 Id
6c 6c 6c 6c 6c 6c 20 PZN
06 64 66 67 7b 66 63 EVP (len + text)
04 65 7b 65 65 HAP (len + text)
28 00 07 1a 06 14 13 10 ArtikelBez (len + text)
75 17 10 1c 1b 17 10 00
01 10 19 75 62 62 66 67
16 75 75 75 75 75 75 64
75 75 19 78 14 17 16 11
10
1a 06 05 07 1c 1b 12 75 ArtikelText (len + text)
11 10 75 19 00 75 14 18
75 block ends here
05 00 59 00 43 00 00 00 record #2 (first part of this record)
00 00 00 00 c8
0c 80 flags
65 64 65 65 65 65 65 64 Id
60 66 60 6d 64 6d 20 PZN
04 65 7b 65 65 EVP (len + text)
05 64 6d 7b 6c 64 HAP (len + text)
23 05 1e 75 13 00 10 07 ArtikelBez (len + text)
75 01 1c 10 07 10 75 75
75 75 67 60 65 75 18 19
78 14 17 16 11 10 13 12
1d 1c 1f 1e
13 00 07 ArtikelText (len + text)
block ends here
09 00 0f 01 final part of record #1
1e 64 75 1b 1a 75 01 1a ArtikelText coninued
05 60
67 60 63 62 62 Hersteller
00 filler
block ends here, record ends here
09 00 16 02 final part of record #2
10 01 10 07 06 16 1d 1c ArtikelText continued
10 1b 10 14 16 61 05 63
63
66 63 66 63 62 Hersteller
00 00 filler
block ends here, record ends here
So this is handled even with the "encrypted" text.
DELETE should be no problem because the blocks are marked as free and put into a linked list, so I didn't test this.
index: so let's inspect what happens with an index:
MariaDB [TestEnc]> show index from TestNormalDyn;
Empty set (0.00 sec)
MariaDB [TestEnc]> show index from TestEncDyn;
Empty set (0.00 sec)
MariaDB [TestEnc]> create index Hersteller on TestNormalDyn(Hersteller);
Query OK, 2 rows affected (0.04 sec)
Records: 2 Duplicates: 0 Warnings: 0
MariaDB [TestEnc]> create index Hersteller on TestEncDyn(Hersteller);
Query OK, 2 rows affected (0.05 sec)
Records: 2 Duplicates: 0 Warnings: 0
MariaDB [TestEnc]>
We've seen that our data is "encrypted" in the data-file but a copy of the values of the column
Hersteller must appear in the index-file. You will find a description of the structure of the
MYI-file
here. Let's look into the two index-files we just filled with data (I've shortened the output and marked the relevant parts in bold):
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestNormalDyn.MYI
*
000400 00 1a 01 32 35 36 37 37 00 00 00 00 00 00 01 33
000410 36 33 36 37 00 00 00 00 00 64 00 00 00 00 00 00
000420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
000800
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestEncDyn.MYI
*
000400 00 1a 01 66 63 66 63 62 00 00 00 00 00 64 01 67
000410 60 63 62 62 00 00 00 00 00 00 00 00 00 00 00 00
000420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
000800
august@AMD4:~/MariaDB/data/TestEnc$
As you can see the values of the column
Hersteller are stored encrypted in the file
TestEncDyn.MYI but the order of appearance has changed compared to the file
TextNormalDyn.MYI:
MariaDB [(none)]> select string_xor( unhex( 6663666362 ) ), string_xor( unhex( 6760636262 ) );
+-----------------------------------+-----------------------------------+
| string_xor( unhex( 6663666362 ) ) | string_xor( unhex( 6760636262 ) ) |
+-----------------------------------+-----------------------------------+
| 36367 | 25677 |
+-----------------------------------+-----------------------------------+
1 row in set (0.01 sec)
MariaDB [(none)]>
Instead of explaining this now please allow me to do some more tests on the
SQL-level. First of all I need some more data in the table:
MariaDB [TestEnc]> truncate table TestNormalDyn;
Query OK, 0 rows affected (0.00 sec)
MariaDB [TestEnc]> truncate table TestEncDyn;
Query OK, 0 rows affected (0.00 sec)
MariaDB [TestEnc]> insert into TestNormalDyn select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (2 min 7.74 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestEnc]> insert into TestEncDyn select * from TestOK.ABDAOK;
Query OK, 10000000 rows affected (2 min 13.34 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [TestEnc]> show indexes from TestNormalDyn;
+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestNormalDyn | 1 | Hersteller | 1 | Hersteller | A | 2581 | NULL | NULL | YES | BTREE | | |
+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)
MariaDB [TestEnc]> show index from TestEncDyn;
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TestEncDyn | 1 | Hersteller | 1 | Hersteller | A | 2581 | NULL | NULL | YES | BTREE | | |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)
MariaDB [TestEnc]>
Now we have 10 mio. records in both tables and the index is active on both tables. Let's do some
SELECTs using the index:
MariaDB [TestEnc]> select * from TestNormalDyn order by Hersteller limit 10;
+----------+--------+-------+-------+--------------------------------------+----------------------------+------------+
| Id | PZN | EVP | HAP | ArtikelBez | ArtikelText | Hersteller |
+----------+--------+-------+-------+--------------------------------------+----------------------------+------------+
| 01001854 | 999999 | 11.95 | 8.84 | TINCTURA LAVANDULAE 1/5 250 ML | DANSAC DUO GES HAFTP50 TRA | 00020 |
| 01008613 | 999999 | 0.00 | 1.37 | MARCOUMAR 50 ST | WIBA DEDUCO D 7 URINBTL AN | 00020 |
| 01013982 | 999999 | 7.25 | 42.00 | TETESEPT HAAR KRAFT 40 ST | ACIDUM PHOS D 3 | 00020 |
| 01021789 | 999999 | 20.90 | 6.26 | CHRUETERMAENN CASSIA 10 ML | COMPRESSANA NOV K2 A-T4SIL | 00020 |
| 01024190 | 548755 | 91.07 | 50.15 | LM 150 CARBO VEGETAB ARCA 10 ML | HYPOTHALAM SUIS INJ ORG | 00020 |
| 01024645 | 999999 | 0.00 | 15.20 | OROS ROLLSTUHLKISS 40X43X5 1 ST | STOKOSEPT UNPARF FLASCHE | 00020 |
| 01025398 | 999999 | 0.00 | 0.00 | LIDOCAIN 10 G | FURANTHRIL 500 | 00020 |
| 01025951 | 999999 | 0.00 | 39.74 | HIRNSTAMM GL D-R 10X1 ML | SYMPHYTUM COMP | 00020 |
| 01028733 | 999999 | 2.67 | 3.81 | LM ALLIUM CEPA VI 5 G | NORTA BA TI 2W M 5 CH14 | 00020 |
| 01029951 | 374002 | 0.00 | 6.56 | HAMAMELIS INJ 50 ST | TAGONIS | 00020 |
+----------+--------+-------+-------+--------------------------------------+----------------------------+------------+
10 rows in set (0.00 sec)
MariaDB [TestEnc]> select * from TestEncDyn order by Hersteller limit 10;
+----------+--------+-------+-------+--------------------------------------+----------------------------+------------+
| Id | PZN | EVP | HAP | ArtikelBez | ArtikelText | Hersteller |
+----------+--------+-------+-------+--------------------------------------+----------------------------+------------+
| 01000393 | 999999 | 27.28 | 10.06 | GINSENG D200 10 ML | DUSCHHOCKER 83047 | 41200 |
| 01001895 | 999999 | 0.00 | 10.17 | PAEONIA OFFIC C 3 10 ML | PAPRIKA SCHARF | 41200 |
| 01005147 | 999999 | 0.00 | 0.00 | VITAMIN D3 OEL LOES 100000 100 G | MEDIVEN 2 AD VK GF SCHWA 5 | 41200 |
| 01008793 | 999999 | 20.07 | 0.00 | EYE CARE MAKE UP ROSE 16 35 G | SANFT U SICHER IL ZW WEI57 | 41200 |
| 01011168 | 999999 | 0.00 | 50.29 | CEPA D12 20 ML | GALIUM APARINE D 4 | 41200 |
| 01011673 | 999999 | 0.00 | 9.30 | KARIL 50 4X5 ST | GUELDENMOOR 77 KR COCKTAIL | 41200 |
| 01013738 | 999999 | 24.70 | 6.26 | COMPACT KOLOS BTL32MM 3594 10 ST | LATEXHANDSCHUHE UNST GROSS | 41200 |
| 01018792 | 999999 | 0.00 | 3.04 | PRAECIGLUCON 30 ST | LM GINKGO BILOBA XVIII | 41200 |
| 01020311 | 265307 | 41.40 | 0.00 | ELEGANCE/2 ADVK OSP MOCCA6 2 ST | SANKT SEVERIN KRAEUTERMASS | 41200 |
| 01022366 | 999999 | 4.79 | 11.20 | LEYHS KINDER GESICHTSCREME 100 ML | KOMPRESSEN GR 2 TUERKIS | 41200 |
+----------+--------+-------+-------+--------------------------------------+----------------------------+------------+
10 rows in set (0.00 sec)
MariaDB [TestEnc]>
As you can see the results differ. Instead of explaining the difference now let me show you another observation:
MariaDB [TestEnc]> select count(*) from TestNormalDyn where Hersteller = '25677';
+----------+
| count(*) |
+----------+
| 17083 |
+----------+
1 row in set (0.06 sec)
MariaDB [TestEnc]> select count(*) from TestEncDyn where Hersteller = '25677';
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
MariaDB [TestEnc]> select count(*) from TestEncDyn ignore index (Hersteller) where Hersteller = '25677';
+----------+
| count(*) |
+----------+
| 17083 |
+----------+
1 row in set (42.18 sec)
MariaDB [TestEnc]> select count(*) from TestEncDyn where Hersteller = string_xor('25677');
+----------+
| count(*) |
+----------+
| 17083 |
+----------+
1 row in set (0.04 sec)
MariaDB [TestEnc]>
explanations: when you look at the box above you will see 4
SQL-statements which do the same but their results can differ. For the first statement I want to say: the result on the table
TestNormalDyn is correct (please trust me). The second statement found nothing in the table
TestEncDyn, a surprising result. You have to look at the hex-dump of the file
TestEncDyn.MYI some lines above to understand this result. The database-server walks through the index
Hersteller of the table
TestEncDyn to find an entry '
25677' and it couldn't find one because all entries in
TestEncDyn.MYI are encrypted so the result 0 is correct (but it's not what we want). The third statement tells the server to ignore the index so it does a full table scan on
TestEncDyn.MYD instead and returns the correct value, and for this operation it needs some more time so we loose the advantage of an index. The fourth statement shows a fast way to the correct result but we are back to the approach using an UDF which was no good idea.
Now please look at the test before this one, the test including
orde by Hersteller in the
SQL-statement. There you can see different result-sets. This can be explained with the information given before: in both cases the database-server uses the index because the index contains the values of the column
Hersteller in sorted order. So the server fetches the first entries from the index and reads the corresponding records from the table. In the unencrypted case the first entry in the index has the value '
00020' and in the encrypted case it starts with the value '
adgee' which is the encrypted value for '
41200'. This explains the differences in the result-sets.
index-handling code: in the approach presented here the data is encrypted and then given to the proper routines of the engine for further processing so it appears in the index in encrypted form. For encrypting data used in an index without impairing the operability of an index we have to modify the code of the index-handler. One of the more central atomic operations of an index are comparisons so let me describe the needed modifications in some sort of pseudocode: let's assume A and B are rows of a table, we currently do the comparison in the form:
if ( A.column < B.column ) then ....
instead we have to code it in this form:
if ( decrypt(A.column) < decrypt( B.column) ) then ....
This would solve the problem shown above with the
SQL-statement that contains the
order by Hersteller part but it has to be done on multiple places.
other engines
I've chosen
MyISAM as the basis for my demonstration here because this engine is feature-rich so implementing encryption is a non-trivial job. For implementing encryption in any other storage engine similar code has to be written. These modification can be done in the functions I've used or in lower level functions of each engine as the link given in the first post showed
this link). This could also solve the problem with the index-handling that I've showed.
protection
The data of a database is stored in files. Accessing the files can be restricted via the operating system but an administrator of such a machine should be able to access these files and copy them anyway. When some of the problems of the approach described here (e.g. indexes) are solved then the approach presented here offers some additional protection.
But copying a file is not the only attack possible on the data. Following the approach presented here the data is stored on disk in encrypted form but for handling the data and working on the query the database-server needs the data in unencrypted form. So on disk data is encrypted, in RAM the data is unencrypted. The story of the
Heartbleed-bug showed that data in RAM can sometimes be read by some tricks. Other attacks were made via
SQL-injection. More attacks are possible.
Encrypting the data on the file-level is not the last word on data-protection. And if you look at my suggestions for modifying the index-code you will add an entry-point for breaking the encryption because in the index you can find hints about the sort-order (think of the history of the
Enigma and
Bletchley Park). Encryption is never completed.
finally
I started this post by telling the story of a phone-call I received some weeks ago. Core-topic of the discussions was a project about encrypting data in
MariaDB/
MySQL and I was interested in this project. But I fell through the sieve in the first round, I didn't get the contract.