A CDE Definition
A compression technique for storing multiple backup versions. If backups are routinely made of the same files and databases, there may be only a small number of changes between versions with most of the data in the latest backup identical to the previous backup.
Capacity optimization works by breaking data files into parts, and if any part in the new backup file is the same as the last backup, that part can be turned into a reference to the original rather than storing all the data. Whereas normal data compression typically reduces a text or database file to around 40 to 50% of the original, capacity optimization can reduce a new backup version to as little as 5% of the original. See data compression.
There are two categories of data compression. The first reduces the size of a single file to save storage space and transmit faster. The second is for storage and transmission convenience.
#1 - Compressing a Single File
The JPEG image, MPEG video, MP3 audio and G.7xx voice formats are widely used "lossy" methods that analyze which pixels, video frames or sound waves can be removed forever without the average person noticing (see lossy compression). GIF images have no loss of pixels but may have a loss of colors (see GIF).
JPEG files can be reduced as much as 80%; MPEG enables a two-hour HD movie to fit on a single disc, and MP3 sparked a revolution by reducing CD music 90%. For a list of compression methods, see codec examples. See JPEG, GIF, MPEG, MP3, G.7xx and interframe coding.
#2 - Compressing a Group of Files (Archiving)
The second "lossless" category compresses and restores data without the loss of a single bit. Although this is widely used for documents, this method is not aware of the content's purpose. It merely looks for repeatable patterns of 0s and 1s, and the more patterns, the higher the compression ratio. Text documents compress the most, while binary and already-compressed files (JPEG, MPEG, etc.) compress the least.
Although lossless methods such as the ZIP format are used to reduce the size of a single, huge file, they are widely used to compress several files into one "archive." It is convenient to store and considerably more convenient to transmit a single file than to keep track of multiple files. See lossless compression, archive, archive formats and capacity optimization.
Lossless Methods (Dictionary and Statistical)
The widely used dictionary method creates a list of repeatable phrases. For example, GIF images and ZIP and JAR archives are compressed with this method (see LZW). The statistical method converts characters into variable length strings of bits based on frequency of use (see Huffman coding).
Before/After Your Search Term
Terms By Topic
Click any of the following categories for a list of fundamental terms.