Thursday 28 August 2014

encryption (3)

this is the third part of my posting about encrypting data in MariaDB/MySQL. You will find the other parts here: first post and second post.

missing

In the last post I described an approach using a storage engine for encrypting the data. But the examples showed tables consisting of columns with types CHAR and VARCHAR. Other types are possible but were omitted, so I decided to implement, test and demonstrate these types too.

So I will present here a test-case where the table contains columns of these types: INTEGER, DECIMAL, NUMERIC, DATETIME, TIMESTAMP. For handling these types I had to modify my code a bit so I updated the archive containing the code. You will still find the archive here.

why MyISAM

I used MyISAM as the basis of my code because by inheritance I got a relative feature-rich engine in a fast way. And when you write your own storage engine you have to deal with MyISAM anyway because the data of a record is given from the server to your engine in a buffer in which ist is formatted in MyISAM-style.And you'll have to give the data back from the engine to the server in this format.

code

As I have to modify the code I'll take the code from the second post as the basis for my decription here.

In the file ha_mintempl.h please add these lines at the proper place:
 private:
  void     do_encrypt ( uchar *, int);
  void     do_decrypt ( uchar *, int);

In the file ha_mintempl.cc please add these functions at some place:
void ha_mintempl::do_encrypt ( uchar *buf, int len)
{
 // do an XOR with the char 0x55
 for ( int i = 0; i < len; i++)
 {
  buf[i] ^= 0x55;
 }
 return ;
}

void ha_mintempl::do_decrypt ( uchar *buf, int len)
{
 // do an XOR with the char 0x55
 for ( int i = 0; i < len; i++)
 {
  buf[i] ^= 0x55;
 }
 return ;
}

You also have to replace these functions with this new code:
void ha_mintempl::encrypt_full_row( void )
{
    my_bitmap_map *old_sets[2];

    dbug_tmp_use_all_columns(table, old_sets, table->read_set, table->write_set);
    for ( Field **ptr2Col = (Field**) table->field; *ptr2Col ; ptr2Col++)
    {
        if ( !((*ptr2Col)->is_null()) )
        {
            if ( (*ptr2Col)->type() != MYSQL_TYPE_VARCHAR )
            {
                do_encrypt ( (*ptr2Col)->ptr, (*ptr2Col)-> pack_length_in_rec());
            }
            else
            {
                String oldData; //  = new String();
                (*ptr2Col)->val_str(&oldData, &oldData);           // first argument is marked as:  __attribute__((unused))
                String * newData = do_encrypt( &oldData );
                (*ptr2Col)->store( newData->ptr(), newData->length(), system_charset_info);
            }
        }
    }
    dbug_tmp_restore_column_maps(table->read_set, table->write_set, old_sets);
    return;
}
void ha_mintempl::decrypt_full_row( void )
{
    my_bitmap_map *old_sets[2];

    dbug_tmp_use_all_columns(table, old_sets, table->read_set, table->write_set);
    for ( Field **ptr2Col = (Field**) table->field; *ptr2Col ; ptr2Col++)
    {
 if ( !((*ptr2Col)->is_null()) )
 {
            if ( (*ptr2Col)->type() != MYSQL_TYPE_VARCHAR )
     {
  do_decrypt ( (*ptr2Col)->ptr, (*ptr2Col)-> pack_length_in_rec());
     }
     else
     {
                String oldData; //  = new String();
  (*ptr2Col)->val_str(&oldData, &oldData);           // first argument is marked as:  __attribute__((unused))
  String * newData = do_decrypt( &oldData );
  (*ptr2Col)->store( newData->ptr(), newData->length(), system_charset_info);
     }
 }
    }
    dbug_tmp_restore_column_maps(table->read_set, table->write_set, old_sets);
    return;
}

As you can see a type of VARCHAR is handled as before only for other types a pointer to the internal representation in the buffer is read and the len (in internal representation of the data). With these information the data is encrypted directly in the buffer, no conversion between types is needed because this encryption will result in errors: think of an integer, read from the buffer in string-format, which is "encrypted" in our way and then given back via a store()-function. By checking this data an error occurs.......
Handling a VARCHAR is a bit more difficult because you shouldn't modify the len-byte(s), so this is handled the old-way.

testing

create: let's create tables with some more column-types: TestNormalCols is a table of the standard MyISAM-type, used for comparison, TestEncCols shall contain the same data but in our encrypted form:
 MariaDB [TestEnc]> create table TestNormalCols (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  integer,
    ->                 EVP  decimal(6, 2),
    ->                 HAP  numeric(7,2),
    ->                 ArtikelBez varchar(40),
    ->                 ArtikelText varchar(26),
    ->                 Hersteller  char(5),
    ->                 InsertedOn datetime,
    ->                 InsertedTStamp timestamp )
    ->                 engine=MYISAM
    -> ;
Query OK, 0 rows affected (0.04 sec)

MariaDB [TestEnc]> desc TestNormalCols;
+----------------+--------------+------+-----+-------------------+-----------------------------+
| Field          | Type         | Null | Key | Default           | Extra                       |
+----------------+--------------+------+-----+-------------------+-----------------------------+
| Id             | char(8)      | YES  |     | NULL              |                             |
| PZN            | int(11)      | YES  |     | NULL              |                             |
| EVP            | decimal(6,2) | YES  |     | NULL              |                             |
| HAP            | decimal(7,2) | YES  |     | NULL              |                             |
| ArtikelBez     | varchar(40)  | YES  |     | NULL              |                             |
| ArtikelText    | varchar(26)  | YES  |     | NULL              |                             |
| Hersteller     | char(5)      | YES  |     | NULL              |                             |
| InsertedOn     | datetime     | YES  |     | NULL              |                             |
| InsertedTStamp | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+----------------+--------------+------+-----+-------------------+-----------------------------+
9 rows in set (0.01 sec)

MariaDB [TestEnc]> create table TestEncCols (                                                                                                                                                                             
    ->                 Id   char(8),
    ->                 PZN  integer,
    ->                 EVP  decimal(6, 2),
    ->                 HAP  numeric(7,2),
    ->                 ArtikelBez varchar(40),
    ->                 ArtikelText varchar(26),
    ->                 Hersteller  char(5),
    ->                 InsertedOn datetime,
    ->                 InsertedTStamp timestamp )
    ->                 engine=MINTEMPL
    -> ;
Query OK, 0 rows affected (0.04 sec)

MariaDB [TestEnc]> 

INSERT and SELECT: next we will insert one record in both tables and check what's in the table:
 MariaDB [TestEnc]> insert into TestNormalCols values ( '01000000', 999999, 132.36, 0.00, 'UROSAFE BEINBEUTEL 7732C      1  L', 'SPRING DE LU AM K1 NO TOP5', '25677', '2014-08-26 14:59:00', current_timestamp);
Query OK, 1 row affected (0.01 sec)

MariaDB [TestEnc]> insert into TestEncCols values ( '01000000', 999999, 132.36, 0.00, 'UROSAFE BEINBEUTEL 7732C      1  L', 'SPRING DE LU AM K1 NO TOP5', '25677', '2014-08-26 14:59:00', current_timestamp);
Query OK, 1 row affected (0.00 sec)

MariaDB [TestEnc]> select * from TestNormalCols;
+----------+--------+--------+------+------------------------------------+----------------------------+------------+---------------------+---------------------+
| Id       | PZN    | EVP    | HAP  | ArtikelBez                         | ArtikelText                | Hersteller | InsertedOn          | InsertedTStamp      |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+---------------------+---------------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C      1  L | SPRING DE LU AM K1 NO TOP5 | 25677      | 2014-08-26 14:59:00 | 2014-08-26 15:45:21 |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+---------------------+---------------------+
1 row in set (0.00 sec)

MariaDB [TestEnc]> select * from TestEncCols;
+----------+--------+--------+------+------------------------------------+----------------------------+------------+---------------------+---------------------+
| Id       | PZN    | EVP    | HAP  | ArtikelBez                         | ArtikelText                | Hersteller | InsertedOn          | InsertedTStamp      |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+---------------------+---------------------+
| 01000000 | 999999 | 132.36 | 0.00 | UROSAFE BEINBEUTEL 7732C      1  L | SPRING DE LU AM K1 NO TOP5 | 25677      | 2014-08-26 14:59:00 | 2014-08-26 15:45:27 |
+----------+--------+--------+------+------------------------------------+----------------------------+------------+---------------------+---------------------+
1 row in set (0.00 sec)

MariaDB [TestEnc]> 

Looks identical except for the timestamp, but let's dive into the hex-values.

into hex

Our single records looked identical when selected from both tables. To verify that the values are indeed encrypted in the data-file we have to look into the files TestNormalCols.MYD and TestEncCols.MYD.
For understanding the contents of the MYD-files we need some explanations which you can find here od and here 20.1.2 Physical Attributes of Columns.
august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestNormalCols.MYD
000000 03 00 64 00                              block-header
       00 00                                    flags
       30 31 30 30 30 30 30 30                  Id
       3f 42 0f 00                              PZN = 0x0f423f = 999999
       80 84 24                                 EVP = flag + 132 (=0x84) + 36 (=0x24) 
       80 00 00 00                              HAP = flag + 0 + 0
       22 55 52 4f 53 41 46 45                  ArtikelBez
       20 42 45 49 4e 42 45 55 
       54 45 4c 20 37 37 33 32 
       43 20 20 20 20 20 20 31 
       20 20 4c 
       1a 53 50 52 49 4e 47 20                  ArtikelText
       44 45 20 4c 55 20 41 4d 
       20 4b 31 20 4e 4f 20 54 
       4f 50 35 
       32 35 36 37 37                           Hersteller
       6c b4 c9 66 51 12 00 00                  InsertedOn = 0x125166c9b46c = 20140826145900 = YYYYMMDDHHMMSS
       71 8f fc 53                              InsertedTStamp = 0x53fc8f71 = 1409060721 = Tue, 26 Aug 2014 13:45:21 GMT
                                                               = no. of seconds since 1970 (Unix-like)

august@AMD4:~/MariaDB/data/TestEnc$ od -t x1 -Ax TestEncCols.MYD
000000 03 00 64 00                              block-header 
       00 00                                    flags
       65 64 65 65 65 65 65 65                  Id
       6a 17 5a 55                              PZN
       d5 d1 71                                 EVP
       d5 55 55 55                              HAP
       22 00 07 1a 06 14 13 10                  ArtikelBez                  
       75 17 10 1c 1b 17 10 00 
       01 10 19 75 62 62 66 67 
       16 75 75 75 75 75 75 64 
       75 75 19 
       1a 06 05 07 1c 1b 12 75                  ArtikelText
       11 10 75 19 00 75 14 18 
       75 1e 64 75 1b 1a 75 01
       1a 05 60 
       67 60 63 62 62 39                        Hersteller
       e1 9c 33 04 47 55 55                     InsertedOn
       22 da a9 06                              InsertedTStamp
000068
august@AMD4:~/MariaDB/data/TestEnc$ 

verification

For the table TestEncCols the data looks encrypted, but let me test this for 2 columns.

PZN: the type of this column is INTEGER and these are stored as 4-byte entries, so let's take the bytes and decrypt them:
PZN:   6a 17 5a 55
       6a:     6a XOR 55 = 3f
       17:     17 XOR 55 = 42
       5a:     5a XOR 55 = 0f
       55:     55 XOR 55 = 00
As you can see the value from the table TestEncCols are identical after you decrypted them.

InsertedTStamp: let's look at another column, one of type TIMESTAMP:
InsertedTStamp:   22 da a9 06 
       22:     22 XOR 55 = 77
       da:     da XOR 55 = 8f
       a9:     a9 XOR 55 = fc
       06:     06 XOR 55 = 53
In the table TestNormalCols you see a value of 0x53fc8f71 in the column InsertedTStamp, in TestEncCols you see a value of 0x53fc8f77 after decryption. The difference is a value of 6, this number tells you that it took me 6 seconds to start the second test after the first one.
Converting the given hex-values into decimals and then using these numbers in this site we can convert these numbers into a human readable form. Looks like the encryption worked.

I've only used INTEGERs for verifying the encryption, you may do these steps for the other types by yourself.

final words

Looks like encrypting data is possible within MariaDB/MySQL. Naturally problems will arise during realisation because this is normal in software-projects, but I'm sure these can be solved. Just do it.