Data Deduplication with Linux
Posted on: 08/25/2011 07:24 PM

Linux Journal posted an article on data deduplication with Lessfs

Data Deduplication with Linux

In recent years, the storage industry has been busy providing some of the most advanced features to its customers, including data deduplication. Data deduplication is a unique data compression technique used to eliminate redundant data and decrease the total capacities consumed on an enabled storage volume. A volume can refer to a disk device, a partition or a grouped set of disk devices all represented as single device. During the process of deduplication, redundant data is deleted, leaving a single copy of the data to be stored on the storage volume.

One ideal use-case scenario is when multiple copies of a large e-mail message are distributed and stored on a mail server. An e-mail message the size of just a couple megabytes does not seem too bad, but if it were sent and forwarded to more than 100 recipients—that's more than 200MB of copies of the same file(s).

Another great example is in the arena of host virtualization. In recent years, virtualization has been the hottest trend in server administration. If you are deploying multiple virtual guests across a network that may share the same common operating system image, data deduplication significantly can reduce the total size of capacity consumed to a single copy and, in turn, reference the differences when and where needed.

Printed from Linux Compatible (