Transfer less over the Internet with De-duplication
De-duplication is the process of eliminating duplicate blocks of data whilst still retaining enough redundancy for a top-level Disaster Recovery plan.
There are two type of de-duplication that we are concerned with and this is how it works:
Client Side De-duplication
Each file is broken down into small blocks. There is a database that compares all the blocks and determines which are the same across multiple files. Only one copy of each block needs to be transferred across the Internet to be stored on our servers.
Server Side De-duplication
Our system analyses all requests from the client software in your build and compares them to the existing blocks already sitting on our servers.There is a database on our end that maps the data’s checksum to its storage location and reference count. When attempting to store another copy of the same block, instead of allocating more disk space the de-dupe code just increments the reference count on the existing data.
Hence, the back-end is able to rebuild multiple versions of a file using only the reference counts and pointers to other blocks relating to the file that is needed.
Not only does this save an enormous amount of space in storage (multiples, rather than percentages) but it massively reduces the load on a client’s Internet connection. No data is transferred until the software and the Offsite Backup database server have corresponding quotas for new and changed blocks.
A very basic example to those new to de-duplication
To give a basic real-world example let’s say you have a client with 100 Microsoft Word documents and each are made up of 3 blocks. All of these files are different, but the first and last blocks are the same across all files.
Only 1 version of the first and last blocks needs to be backed up, as well as each of the middle blocks. This means that only 102 blocks of data need to be transferred across the Internet instead of the full 300 – or a 66% reduction.
Now lets say you client has 5 staff who each work on a different laptop but they have the same 100 documents each. Only one set of documents needs to be transferred – this means out of the 1500 blocks (500 files) only 102 blocks of data are actually transferred across the Internet. This is a 92% reduction in the amount of traffic.
Of course it doesn’t always happen this nicely but you get the idea!
