Encryption: Difference between revisions

From wiki.zmanda.com
Jump to navigation Jump to search
(reformat, move some stuff out, move some stuff in)
Line 1: Line 1:
==Need for encryption ==
<div style="float: right">__TOC__</div>
Two types of encryption:<br/>
Backup encryption is used for two purposes:
1) '''Transport encryption'''<br/>
; transport encryption :
Prevent eavesdropping on the network. Amanda solution: kerberos, ssh<br>
Encryption of data in transit between the server and client to prevent eavesdropping on the network.
;data encryption :
Encryption of the dumped data on tape (or other backup medium) provides protection in case the tape falls into the wrong hands.


2) '''Data encryption'''<br/>
It is becoming routine to hear of companies losing control of massive amounts of critical data when backup tapes stray from the company's control; for example
Provide protection in case a tape is in the hand of the wrong party. Amanda solution: newly added encryption feature in 2.5.
* http://www.washingtonpost.com/wp-dyn/content/article/2005/12/27/AR2005122700959.html


Recent events when credit card company and hotel lost backup tape that's not encrypted. As a result, critical customer information is in danger. Reference: http://www.washingtonpost.com/wp-dyn/content/article/2005/12/27/AR2005122700959.html
= Comparison =


[[*Recent amanda-user discussion on encryption*]]
{| border="1" style="background:white; color:black"
!Transport!!Data!!Tag
|-
|<nowiki>No</nowiki>||No||A
|-
|<nowiki>Yes</nowiki>||No||B
|-
|<nowiki>No</nowiki>||Yes||C
|-
|<nowiki>Yes</nowiki>||Yes||D
|-
|<nowiki>Yes</nowiki>||No||E (public-key)
|}
In scenarios A and C, and eavesdropper on the network can observe the data.  In B D and E, an eavesdropper cannot.


In secenarios A and B, one can retrieve the data from backup tapes without any need for keys.  This is perhaps obvious, but very important for backups.


==What is needed to recover encrypted tapes ==
In C, D, and E, the backup tapes are encrypted.  Thus, one could store them someplace where you trust people not to destroy them but you don't trust them to read them.  This can make some sense for particularly sensitive data.
To properly retrieve any encrypted data, the following are needed:<br/>


#the key (the private key in the public-key encryption case)
In E, the backup data doesn't appear on the server in plaintextThis can be useful for backing up clients with particularly sensitive data.  To really make sense, the client should be configured so that it will only honor dump requests to a preconfigured set of public keys.
#the passphrase
#the "crypt" program used. Amanda dump file header indicates what crypt program was used. For example:
  AMANDA: FILE 20051215 boston.zmanda.com /usr/tmp/gpa2 lev 0 comp .gz program /bin/gtar crypt
enc client_encrypt /usr/local/sbin/amcrypt client_decrypt_option -d
  To restore, position tape at start of file and run:
dd if=<tape> bs=32k skip=1 | /usr/local/sbin/amcrypt -d |  /usr/bin/gzip -dc |  /bin/gtar -f...


* '''If the key or passphrase is lost or misplaced, the data cannot be recovered.'''
== Symmetric vs. Public-key encryption ==
* There is no back-door to the encryption algorithm.
Symmetric encryption is also known as single-key (or secret-key) encryption. The same key is used for encryption as well as decryption.
* Proper key management strategy should be in your plan before using data encryption for backup.<br/><br/>
Pros:
#just one key to manage
#faster
Cons:
#need to share the key between two parties through a secured channel
#to do automatic backup, passphrase needs to store somewhere in this case.


==Server-side and client side encryption==
Public/private key encryption is also known as asymmetric encryption. A public key is used for encryption while a distinct private key is used for decryption.  The systems doing the encryption do not need the private key, so the private key can be stored e.g., in a lockbox until a restore (with the attendant decryption) is required.
Pros:
#no secret (i.e. public key) needed for encrypting.
#if public key is lost, it can be revised from the private key
#no need to use passphrase during encryption
Cons:
#computationally expensive, thus slower[**]
#data is encrypted for a specific person/group. Only the specific person with the right private key can decrypt the data.
#potential man-in-the-middle attack


*a new dumptype option, encrypt is added.
[**] it has been pointed out that computational resources don't matter that much: most systems generate a symmetric session key, which is encrypted using the public key. Hence the slow part is limited to the encryption of the session key, while the actual data is encrypted using the fast symmetric algorithm.
*specify either client or server side in the dumptype (not both):
**encrypt client or encrypt server
*specify client side encryption program:
**client_encrypt  "your encryption program"
***a sample encryption/decryption program amcrypt is provided. amcrypt is a wrapper of aespipe.
***aespipe supports AES128, AES192 and AES256 and it uses SHA-256, SHA-384 and SHA-512 respectively.
***any encryption/decryption program can be used as long as it reads from stdin and writes to stdout.
**client_decrypt_option "decrypt parameter" #default to -d
*specify server side encryption program:
**server_encrypt "your encryption program"
***can use amcrypt as in the case of client encryption.
**server_decrypt_option "decrypt parameter" #default to -d


* The logic assumes compression then encryption during backup(thus decrypt then uncompress during restore). Specifying client-encryption and server-compression is not supported
= Transport Encryption Support =
To set up transport encryption between UNIX hosts, the simplest solution is to set up SSH authentication ([[How To:Set up transport encryption with SSH]]).  The SSH authentication driver multiplexes all of its communication over a single SSH channel, with the result that all data is encrypted.


* dumptype sample:
Alternately, Kerberos authentication can optionally support encryption, although this is not a well-supported option and consumes a significant amount of computing power on both ends of the connection.
define dumptype server-encrypt-fast {
      global
      program "GNUTAR"
      comment "dump with fast client compression and server symmetric encryption"
      compress client fast
      encrypt  server
      server_encrypt "/usr/local/sbin/amcrypt"
      server_decrypt_option "-d"
}


define dumptype client-encrypt-nocomp {
= Data Encryption Support =
      global
Amanda 2.5.0 and later support encryption in a fashion similar to compressionIt can be performed either on the server or the client, and is controlled in dumptype definitions by the ''encrypt client'' or ''encrypt server'' directivesSee [[How To:Set up data encryption]] for more details.
      program "GNUTAR"
      comment "dump with no ompression and client symmetric encryption"
      compress none
      encrypt client
      client_encrypt "/usr/local/sbin/amcrypt"
      client_decrypt_option "-d"
}
 
* To restore client encrypted tape. Do either:
1. take the physical tape to the client machine and do the restore on the client machine where it has the key( am_key.gpg) and passphrase(.am_passphrase).<br/>
 
'''or''' <br/>
 
2. take the key and passphrase to the server machine where the tape is located.<br/>
 
===Additional packages needed===
* aespipe http://loop-aes.sourceforge.net/aespipe/aespipe-v2.3b.tar.bz2 and the bz2aespipe-wrapper that comes with it. It gets patched as described later.
* the wrapper-script amcrypt, as listed below,
* GNU-PG http://www.gnupg.org/(en)/download/index.html. This should be part of most current operating systems already.
* uuencode ( sharutils*.rpm in linux distro).
 
===Setup===
 
* Configure and compile aespipe:
 
tar -xjf aespipe-v2.3b.tar.bz2
cd aespipe-v2.3b
./configure
make
make install
 
* Generate and store the gpg-key for the Amanda user:
 
# taken from the aespipe-README
head -c 2925 /dev/random | uuencode -m - | head -n 66 | tail -n 65 | \
gpg --symmetric -a > ~amanda/.gnupg/am_key.gpg
 
*This will ask for a passphrase. Remember this passphrase as you will need it in the next step.
Store the passphrase inside the home-directory of the Amanda user and protect it with proper permissions:
 
  echo my_secret_passphrase > ~amanda/.am_passphrase
chown amanda:disk ~amanda/.am_passphrase
chmod 700 ~amanda/.am_passphrase
 
*We need this file because we don't want to have to enter the passphrase manually everytime we run amdump. We have to patch bz2aespipe to read the passphrase from a file. I have called that file ~amanda/.am_passphrase.
 
*Store the key and the passphrase in some other place as well, without these information you can't access any tapes that have been encrypted with it (this is exactly why we are doing all this, isn't it? ;) ).
 
* create amcrypt(or it will available in sourceforge and the rpms) as below:
#!/bin/sh
#
# Original wrapper by Paul Bijnens
#
# adapted by Stefan G. Weichinger
# to enable gpg-encrypted dumps via aespipe
# also adapted by Matthieu Lochegnies for server-side encryption
prefix=/usr/local
exec_prefix=${prefix}
sbindir=${exec_prefix}/sbin
AMANDA_HOME=~amanda
AM_AESPIPE=${exec_prefix}/sbin/amaespipe
AM_PASSPHRASE=$AMANDA_HOME/.am_passphrase
$AM_AESPIPE "$@" 3< $AM_PASSPHRASE
rc=$?
exit $rc
 
 
* create amaespipe(or it will available in sourceforge and the rpms) which is based on wrapper-script bz2aespipe, which comes with the aespipe-tarball:
#! /bin/sh
# FILE FORMAT
# 10 bytes: constant string 'bz2aespipe'
# 10 bytes: itercountk digits
# 1 byte: '0' = AES128, '1' = AES192, '2' = AES256
# 1 byte: '0' = SHA256, '1' = SHA384, '2' = SHA512, '3' = RMD160
# 24 bytes: random seed string
# remaining bytes are bzip2 compressed and aespipe encrypted
# These definitions are only used when encrypting.
  # Decryption will autodetect these definitions from archive.
ENCRYPTION=AES256
HASHFUNC=SHA256
ITERCOUNTK=100
AMANDA_HOME=~amanda
WAITSECONDS=1
GPGKEY=""$AMANDA_HOME/.gnupg/am_key.gpg"
FDNUMBER=3
PATH=/usr/bin:/usr/local/bin
export PATH
if test x$1 = x-d ; then
    # decrypt
    n=`head -c 10 - | tr -d -c 0-9a-zA-Z`
    if test x${n} != xbz2aespipe ; then
        echo "bz2aespipe: wrong magic - aborted" >/dev/tty
        exit 1
    fi
    itercountk=`head -c 10 - | tr -d -c 0-9`
    if test x${itercountk} = x ; then itercountk=0; fi
    n=`head -c 1 - | tr -d -c 0-9`
    encryption=AES128
    if test x${n} = x1 ; then encryption=AES192; fi
    if test x${n} = x2 ; then encryption=AES256; fi
    n=`head -c 1 - | tr -d -c 0-9`
    hashfunc=SHA256
    if test x${n} = x1 ; then hashfunc=SHA384; fi
    if test x${n} = x2 ; then hashfunc=SHA512; fi
    if test x${n} = x3 ; then hashfunc=RMD160; fi
    seedstr=`head -c 24 - | tr -d -c 0-9a-zA-Z+/`
    aespipe -K ${GPGKEY} -p ${FDNUMBER} -e ${encryption} -H ${hashfunc} -S ${seedstr} -C ${itercountk} -d
else
    # encrypt
    echo -n bz2aespipe
    echo ${ITERCOUNTK} | awk '{printf "%10u", $1;}'
    n=`echo ${ENCRYPTION} | tr -d -c 0-9`
    aesstr=0
    if test x${n} = x192 ; then aesstr=1; fi
    if test x${n} = x256 ; then aesstr=2; fi
    n=`echo ${HASHFUNC} | tr -d -c 0-9`
    hashstr=0
    if test x${n} = x384 ; then hashstr=1; fi
    if test x${n} = x512 ; then hashstr=2; fi
    if test x${n} = x160 ; then hashstr=3; fi
    seedstr=`head -c 18 /dev/urandom | uuencode -m - | head -n 2 | tail -n 1`
    echo -n ${aesstr}${hashstr}${seedstr}
    aespipe -K ${GPGKEY} -p ${FDNUMBER} -e ${ENCRYPTION} -H ${HASHFUNC} -S ${seedstr} -C ${ITERCOUNTK} -w ${WAITSECONDS}
fi
exit 0
 
 
 
Changes from bz2aespipe:
* Decreased WAITSECONDS: No need to wait for 10 seconds to read the passphrase.
* Removed bzip2 from the pipes: AMANDA triggers GNU-zip-compression by itself, no need to do this twice (slows down things, blows up size).
* Added options -K and -p: This enables aespipe to use the generated gpg-key and tells it the number of the file-descriptor to read the passphrase from.
   
You may set various parameters inside bz2aespipe. You may also call bz2aespipe with various command-line-parameter to choose
the encryption-algorithm, hash-function etc. . For a start I have chosen to call bz2aespipe without command-line-options.

Revision as of 22:58, 31 May 2007

Backup encryption is used for two purposes:

transport encryption

Encryption of data in transit between the server and client to prevent eavesdropping on the network.

data encryption

Encryption of the dumped data on tape (or other backup medium) provides protection in case the tape falls into the wrong hands.

It is becoming routine to hear of companies losing control of massive amounts of critical data when backup tapes stray from the company's control; for example

Comparison

Transport Data Tag
No No A
Yes No B
No Yes C
Yes Yes D
Yes No E (public-key)

In scenarios A and C, and eavesdropper on the network can observe the data. In B D and E, an eavesdropper cannot.

In secenarios A and B, one can retrieve the data from backup tapes without any need for keys. This is perhaps obvious, but very important for backups.

In C, D, and E, the backup tapes are encrypted. Thus, one could store them someplace where you trust people not to destroy them but you don't trust them to read them. This can make some sense for particularly sensitive data.

In E, the backup data doesn't appear on the server in plaintext. This can be useful for backing up clients with particularly sensitive data. To really make sense, the client should be configured so that it will only honor dump requests to a preconfigured set of public keys.

Symmetric vs. Public-key encryption

Symmetric encryption is also known as single-key (or secret-key) encryption. The same key is used for encryption as well as decryption. Pros:

  1. just one key to manage
  2. faster

Cons:

  1. need to share the key between two parties through a secured channel
  2. to do automatic backup, passphrase needs to store somewhere in this case.

Public/private key encryption is also known as asymmetric encryption. A public key is used for encryption while a distinct private key is used for decryption. The systems doing the encryption do not need the private key, so the private key can be stored e.g., in a lockbox until a restore (with the attendant decryption) is required. Pros:

  1. no secret (i.e. public key) needed for encrypting.
  2. if public key is lost, it can be revised from the private key
  3. no need to use passphrase during encryption

Cons:

  1. computationally expensive, thus slower[**]
  2. data is encrypted for a specific person/group. Only the specific person with the right private key can decrypt the data.
  3. potential man-in-the-middle attack

[**] it has been pointed out that computational resources don't matter that much: most systems generate a symmetric session key, which is encrypted using the public key. Hence the slow part is limited to the encryption of the session key, while the actual data is encrypted using the fast symmetric algorithm.

Transport Encryption Support

To set up transport encryption between UNIX hosts, the simplest solution is to set up SSH authentication (How To:Set up transport encryption with SSH). The SSH authentication driver multiplexes all of its communication over a single SSH channel, with the result that all data is encrypted.

Alternately, Kerberos authentication can optionally support encryption, although this is not a well-supported option and consumes a significant amount of computing power on both ends of the connection.

Data Encryption Support

Amanda 2.5.0 and later support encryption in a fashion similar to compression. It can be performed either on the server or the client, and is controlled in dumptype definitions by the encrypt client or encrypt server directives. See How To:Set up data encryption for more details.