Sunday, February 6, 2011

Backing up with rsync


Rsync is a very powerful copy / transfer / backup tool and probably the least known *nix tools, but it is also available for Windows (with Cygwin). The main purpose of rsync is file synchronization between computers. Backup is actually nothing more than that. Rsync is by default virtually in every Linux distribution, sadly not so often on low-end NAS devices.

What is so appealing in rsync is that apart from very efficient transfer mechanism (only modified files are being transfered) it can make full snapshots of backup while making hard links to unmodified files from previous backup. Hard link is simply additional reference to the same file on disk, so the space is allocated just once. You see the files as normal, you can delete them separately, but disc space will be freed once you delete last hard link. Note - it is not available on FAT, but is available on NTFS or any Linux/Unix FS (ext2, etc.). As a result you have a set of full snapshots of the directories backed up, but only new / modified files take space. You can delete snapshots separately to free up space, even from the middle of chain, without any harm to other snapshots. This way is very convenient to browse if you need to recover something.

Just to show, here is an example how it looks:
# du -sh *
130G    2010-02-10_19-30-01
16K     2010-02-11_19-30-02
16K     2010-02-12_19-30-01
16K     2010-02-13_19-30-01

849M
    2010-02-14_19-30-02
16K     2010-02-15_19-30-01

As you see, initial backup here was 130GB, then nothing changed for several days, which results in only 16kB per backup (virtually no space) and then there was some update on February 14th with 849MB. Under each folder there is a full file structure.

I've created a simple bash script which does that magic with rsync. On Windows again, you can use it with Cygwin. You can grab it here.

The first parameter is source the second is destination. Both in format accepted by rsync, so for local directories, you just give path, for remote you give [user@]server_host:dir_on_server  if you connect over SSH, or [user@]server_host:dir_on_server if you connect over rsync protocol. If you use SSH, the authentication with keys is very convenient in this case. The script can get two more optional attributes - additional options to pass to rsync (you need to quote them if there are spaces in it) and minimum interval between backups in hours.

Some examples.
1. To back up your home directory:
./backup_rsync.sh $HOME server:/path/to/backup
2. To back up directory d:\documents, run in Cygwin prompt:
./backup_rsync.sh /cygdrive/d/documents server:/path/to/backup
Note1. If you back up more than one directory, you need to have separate directories on the server for each backed up directory.

Note 2. To run scripts in Cygwin from scheduler or auto start, you need to give it as paramter to bash, like this:
c:\cygwin\bin\bash.exe -l /cygdrive/c/path/to/script.sh
I'm using it to back up my family computers, at work to back up some servers or source machines for linked clones in VM Ware ESX server. Everything works perfectly for a quite long time.



No comments:

Post a Comment