Skip to main content

IBM Hash Suffix Array Delta Compression

A new differential compression algorithm that combines the hash value and suffix array techniques.

Date Posted: November 27, 2007

alphaworks tab navigation

 

What is IBM Hash Suffix Array Delta Compression?

This new differential compression algorithm combines the hash value techniques and suffix array techniques of previous work. Differential compression refers to encoding a file (called a version file) as a set of changes with respect to another file (a reference file). Previous differential compression algorithms can be shown empirically to run in linear time, but they have certain drawbacks: They do not find the best matches for every offset of the version file. This algorithm finds the best matches for every offset of the version file with respect to a certain level of detail and above a certain length threshold.

The algorithm has two variations that depend on how we choose the block size. If the block size is kept fixed, the compression performance of this algorithm is similar to that of the greedy algorithm, without the expensive space and time requirements. If the block size is varied linearly with the reference file size, this algorithm can run in linear time and constant space. It can also be shown empirically that this algorithm is faster and more compressed than certain differential compression algorithms.

How does it work?

This technology's utility sources are -c89 standard compatible. IBM® Hash Suffix Array Delta Compression is compiled for Windows®, Linux®, and Solaris systems. It is believed that, with little or no modification, it can also be compiled for other systems upon request.

The utility uses bzip2 (1.0.2) as the second-level compression algorithm.

Further information is available in the following paper: An approximation to the greedy algorithm for differential compression.

The following code shows how to run this algorithm:

Example (on Linux):

To compress:
./hsadelta -c -r linux-2.2.26.tar 
   -v linux-2.6.22.9.tar -o linux-delta

To reconstruct:

./hsadelta -d -r linux-2.2.26.tar 
   -v linux-2.6.22.9-reconstruct.tar 
   -o linux-delta

About the technology author(s)

Karan Gupta is a software engineer at IBM Research in San Jose, CA. He can be reached through e-mail.

Ramesh C. Agarwal is an IBM Fellow Emeritus at IBM Research in San Jose, CA. He can be reached through e-mail.

Trademarks




Related technologies