Μ-law algorithm
.m2ts
3GP and 3G2
3ivx
7-Zip
A-law
A-law algorithm
ALZip
AMV video format
APNG
ARC (file format)
ARJ
Acoustics
Adaptive DPCM
Adaptive Huffman coding
Adaptive Multi-Rate Wideband
Adaptive Multi-Rate audio codec
Adaptive Transform Acoustic Coding
Advanced Audio Coding
Advanced Systems Format
Algebraic Code Excited Linear Prediction
Algorithmic complexity theory
Algorithmic information theory
Apple Lossless
Archive Utility
Arithmetic coding
Ark (computing)
Au file format
Audio Interchange File Format
Audio Lossless Coding
Audio Video Interleave
Audio Video Standard
Audio codec
Audio compression (data)
Audio signal processing
Average bitrate
BMP file format
Bandwidth (computing)
Bandwidth compression
BetterZip
Bink Video
Bit
Bit rate
Blu-code
Burrows–Wheeler transform
Burrows-Wheeler transform
Byte pair encoding
Bzip2
CELT
Cabinet (file format)
Calgary Corpus
Canterbury Corpus
Carnegie Mellon University
Chain code
Chroma subsampling
CineForm
Cinepak
Claude Shannon
Code
Code-excited linear prediction
Coding theory
Color space
Commercial software
Companding
Comparison of audio codecs
Comparison of file archivers
Comparison of video codecs
Compress
Compression artifact
Computer network
Computer science
Constant bitrate
Container format (digital)
Context mixing
Context tree weighting
Convolution
CoreAVC
Cryptography
DEFLATE
DEFLATE (algorithm)
DGCA
DNxHD codec
DPCM
DTS (sound system)
DV
DVD
Dasher
Data compression
Data compression symmetry
Data deduplication
Data differencing
Delta encoding
Dictionary coder
Differential compression
Digidesign#Sound Designer File Formats
Digital Item#File Format
Digital Picture Exchange
Digital camera
Dirac (codec)
Direct Stream Transfer#DST
.m2ts
3GP and 3G2
3ivx
7-Zip
A-law
A-law algorithm
ALZip
AMV video format
APNG
ARC (file format)
ARJ
Acoustics
Adaptive DPCM
Adaptive Huffman coding
Adaptive Multi-Rate Wideband
Adaptive Multi-Rate audio codec
Adaptive Transform Acoustic Coding
Advanced Audio Coding
Advanced Systems Format
Algebraic Code Excited Linear Prediction
Algorithmic complexity theory
Algorithmic information theory
Apple Lossless
Archive Utility
Arithmetic coding
Ark (computing)
Au file format
Audio Interchange File Format
Audio Lossless Coding
Audio Video Interleave
Audio Video Standard
Audio codec
Audio compression (data)
Audio signal processing
Average bitrate
BMP file format
Bandwidth (computing)
Bandwidth compression
BetterZip
Bink Video
Bit
Bit rate
Blu-code
Burrows–Wheeler transform
Burrows-Wheeler transform
Byte pair encoding
Bzip2
CELT
Cabinet (file format)
Calgary Corpus
Canterbury Corpus
Carnegie Mellon University
Chain code
Chroma subsampling
CineForm
Cinepak
Claude Shannon
Code
Code-excited linear prediction
Coding theory
Color space
Commercial software
Companding
Comparison of audio codecs
Comparison of file archivers
Comparison of video codecs
Compress
Compression artifact
Computer network
Computer science
Constant bitrate
Container format (digital)
Context mixing
Context tree weighting
Convolution
CoreAVC
Cryptography
DEFLATE
DEFLATE (algorithm)
DGCA
DNxHD codec
DPCM
DTS (sound system)
DV
DVD
Dasher
Data compression
Data compression symmetry
Data deduplication
Data differencing
Delta encoding
Dictionary coder
Differential compression
Digidesign#Sound Designer File Formats
Digital Item#File Format
Digital Picture Exchange
Digital camera
Dirac (codec)
Direct Stream Transfer#DST
"Source coding" redirects here. For the term in computer programming, see Source code.
In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use, through use of specific encoding schemes.
In computing, data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization.
Compression is useful because it helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. On the downside, compressed data must be decompressed to be used, and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed (the option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video). The design of data compression schemes therefore involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data.
Contents
1 Lossless versus lossy compression
2 Example algorithms and applications
2.1 Lossy
2.2 Lossless
3 Theory
3.1 Machine learning
3.2 Data differencing
4 See also
4.1 Data compression topics
4.2 Compression algorithms
4.2.1 Lossless data compression
4.2.2 Lossy data compression
4.2.3 Example implementations
4.3 Corpora
5 References
6 External links
//
Lossless versus lossy compression
Main articles: Lossless data compression and lossy data compression
Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely without error. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be followed by the letter 'z' is very small. Another kind of compression, called lossy data compression or perceptual coding, is possible if some loss of fidelity is acceptable. Generally, a lossy data compression will be guided by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to variations in color. JPEG image compression works in part by "rounding off" some of this less-important information. Lossy data compression provides a way to obtain the best fidelity for a given amount of compression. In some cases, transparent (unnoticeable) compression is desired; in other cases, fidelity is sacrificed to reduce the amount of data as much as possible.
Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.
However, lossless data compression algorithms will always fail to compress some files; indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already will therefore usually result in an expansion, as will attempts to compress all but the most trivially encrypted data.
In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, like for example always removing the last byte of a file, will always compress a file up to the point where it is empty.
An example of lossless vs. lossy compression is the following string:
25.888888888
This string can be compressed as:
25.[9]8
Interpreted as, "twenty five point 9 eights", the original string is perfectly recreated, just written in a smaller form. In a lossy system, using
26
instead, the exact original data is lost, at the benefit of a shorter representation.
Example algorithms and applications
The above is a very simple example of run-length encoding, wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression. It is often used to optimize disk space on office computers, or better use the connection bandwidth in a computer network. For symbolic data such as spreadsheets, text, executable programs, etc., losslessness is essential because changing even a single bit cannot be tolerated (except in some limited cases).
For visual and audio data, some loss of quality can be tolerated without losing the essential nature of the data. By taking advantage of the limitations of the human sensory system, a great deal of space can be saved while producing an output which is nearly indistinguishable from the original. These lossy data compression methods typically offer a three-way tradeoff between compression speed, compressed data size and quality loss.
Lossy
Lossy image compression is used in digital cameras, to increase storage capacities with minimal degradation of picture quality. Similarly, DVDs use the lossy MPEG-2 Video codec for video compression.
In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal. Compression of human speech is often performed with even more specialized techniques, so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline from "audio compression". Different audio and speech compression standards are listed under audio codecs. Voice compression is used in Internet telephony for example, while audio compression is used for CD ripping and is decoded by audio players.
Lossless
The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. DEFLATE is a variation on LZ which is optimized for decompression speed and compression ratio, therefore compression can be slow. DEFLATE is used in PKZIP, gzip and PNG. LZW (Lempel-Ziv-Welch) is used in GIF images. Also noteworthy are the LZR (LZ-Renau) methods, which serve as the basis of the Zip method. LZ methods utilize a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g. SHRI, LZX). A current LZ-based coding scheme that performs well is LZX, used in Microsoft's CAB format.
The very best compressors use probabilistic models, in which predictions are coupled to an algorithm called arithmetic coding. Arithmetic coding, invented by Jorma Rissanen, and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm, and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bilevel image-compression standard JBIG, and the document-compression standard DjVu. The text entry system, Dasher, is an inverse-arithmetic-coder.
Theory
The theoretical background of compression is provided by information theory (which is closely related to algorithmic information theory) for lossless compression, and by rate–distortion theory for lossy compression. These fields of study were essentially created by Claude Shannon, who published fundamental papers on the topic in the late 1940s and early 1950s. Cryptography and coding theory are also closely related. The idea of data compression is deeply connected with statistical inference.
Many lossless data compression systems can be viewed in terms of a four-stage model. Lossy data compression systems typically include even more stages, including, for example, prediction, frequency transformation, and quantization.
Machine learning
See also: Machine learning
There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution), while an optimal compressor can be used for prediction (by finding the symbol that compresses best, given the previous history). This equivalence has been used as justification for data compression as a benchmark for "general intelligence".1
Data differencing
Main article: Data differencing
Data compression can be seen as a special case of data differencing23 – data differencing consists of producing a difference given a source and a target, with patching producing a target given a source and a difference, while data compression consists of producing a compressed file given a target, and decompression consists of producing a target given only a compressed file. Thus, one can consider data compression as data differencing with empty source data, the compressed file corresponding to a "difference from nothing". This is the same as considering absolute entropy (corresponding to data compression) as a special case of relative entropy (corresponding to data differencing) with no initial data.
When one wishes to emphasize the connection, one may use the term differential compression to refer to data differencing.
See also
Data compression topics
Algorithmic complexity theory
Information entropy
Self-extracting archive
Image compression
Speech coding
Video compression
Multimedia compression
Minimum description length
Minimum message length (two-part lossless compression designed for inference)
List of archive formats
Comparison of file archivers
List of Unix programs
Free file format
HTTP compression
Magic compression algorithm
Data compression symmetry
Dyadic distribution
Compression algorithms
Lossless data compression
Data deduplication
run-length encoding
dictionary coders
LZ77 & LZ78
LZW
Statistical Lempel Ziv
Burrows-Wheeler transform
prediction by partial matching (also known as PPM)
context mixing
Dynamic Markov Compression (DMC)
entropy encoding
Huffman coding (simple entropy coding; commonly used as the final stage of compression)
Adaptive Huffman coding
Shannon-Fano coding
arithmetic coding (more advanced)
range encoding (same as arithmetic coding, but looked at in a slightly different way)
Golomb coding (simple entropy coding for infinite input data with a geometric distribution)
universal codes (entropy coding for infinite input data with an arbitrary distribution)
Elias gamma coding
Fibonacci coding
Slepian-Wolf coding (SWC) (lossless distributed source coding (DSC))
Lossy data compression
discrete cosine transform
fractal compression
fractal transform
wavelet compression
vector quantization
linear predictive coding
Modulo-N code for correlated data
A-law Compander
Mu-law Compander
Wyner-Ziv coding (WZC) (lossy Distributed source coding (DSC))
Example implementations
DEFLATE (a combination of LZ77 and Huffman coding) – used by ZIP, gzip and PNG files
LZMA used by 7-Zip
LZO (very fast LZ variation, speed oriented)
LZX (an LZ77 family compression algorithm)
liblzg (a minimal LZ77 based compression library)
Unix compress utility (the .Z file format), and GIF use LZW
Unix pack utility (the .z file format) used Huffman coding
bzip2 (a combination of the Burrows-Wheeler transform and Huffman coding)
PAQ (very high compression based on context mixing, but extremely slow; competing in the top of the highest compression competitions)
JPEG (image compression using optional chroma downsampling, discrete cosine transform, quantization, then Huffman coding)
MPEG (audio and video compression standards family in wide use, using DCT and motion-compensated prediction for video)
MP3 (a part of the MPEG-1 standard for sound and music compression, using subbanding and MDCT, perceptual modeling, quantization, and Huffman coding)
AAC (part of the MPEG-2 and MPEG-4 audio coding specifications, using MDCT, perceptual modeling, quantization, and Huffman coding)
Vorbis (DCT based AAC-alike audio codec, designed with a focus on avoiding patent encumbrance)
JPEG 2000 (image compression using wavelets, then quantization, then entropy coding)
TTA (uses linear predictive coding for lossless audio compression)
FLAC (linear predictive coding for lossless audio compression)
Corpora
Data collections, commonly used for comparing compression algorithms.
Canterbury Corpus
Calgary Corpus
References
^ Rationale for a Large Text Compression Benchmark
^ RFC 3284
^ Korn, D.G.; Vo, K.P. (1995), B. Krishnamurthy, ed., Vdelta: Differencing and Compression, Practical Reusable Unix Software, John Wiley & Sons
External links
Introduction to Data Compression by Guy E Blelloch from CMU
v · d · eData compression methods
Lossless
Theory
Entropy · Complexity · Redundancy · Lossy
Entropy encoding
Shannon–Fano · Shannon–Fano–Elias · Huffman · Adaptive Huffman · Arithmetic · Range · Golomb · Universal (Gamma · Exp-Golomb · Fibonacci · Levenshtein)
Dictionary
RLE · Byte pair encoding · DEFLATE · Lempel–Ziv (LZ77/78 · LZSS · LZW · LZWL · LZO · LZMA · LZX · LZRW · LZJB · LZS · LZT · ROLZ) · Statistical Lempel Ziv
Others
CTW · BWT · PPM · DMC · Delta
Audio
Theory
Companding · Convolution · Dynamic range · Latency · Sampling · Nyquist–Shannon theorem · Sound quality
Audio codec parts
LPC (LAR · LSP) · WLPC · CELP · ACELP · A-law · μ-law · ADPCM · DPCM · MDCT · Fourier transform · Psychoacoustic model
Others
Bit rate (CBR · ABR · VBR) · Speech compression · Sub-band coding
Image
Terms
Color space · Pixel · Chroma subsampling · Compression artifact · Image resolution
Methods
RLE · Fractal · Wavelet · EZW · SPIHT · LP · DCT · Chain code · KLT
Others
Test images · PSNR quality measure · Quantization
Video
Terms
Video characteristics · Frame · Frame rate · Interlace · Frame types · Video quality · Video resolution
Video codec parts
Motion compensation · DCT · Quantization
Others
Video codecs · Rate distortion theory · Bit rate (CBR · ABR · VBR)
Timeline of information theory, data compression, and error-correcting codes
See Compression formats for formats and Compression software implementations for codecs
v · d · eMultimedia compression and container formats
Video
ISO/IEC
MJPEG · Motion JPEG 2000 · MPEG-1 · MPEG-2 (Part 2) · MPEG-4 (Part 2/ASP · Part 10/AVC) · HEVC
ITU-T
H.120 · H.261 · H.262 · H.263 · H.264 · HEVC
Others
AVS · Bink · CineForm · Cinepak · Dirac · DV · Indeo · Microsoft Video 1 · OMS Video · Pixlet · RealVideo · RTVideo · SheerVideo · Smacker · Sorenson Video & Sorenson Spark · Theora · VC-1 · VC-2 · VC-3 · VP3 · VP6 · VP7 · VP8 · WMV
Audio
ISO/IEC
MPEG-1 Layer III (MP3) · MPEG-1 Layer II (Multichannel) · MPEG-1 Layer I · AAC · HE-AAC · MPEG Surround · MPEG-4 ALS · MPEG-4 SLS · MPEG-4 DST · MPEG-4 HVXC · MPEG-4 CELP
ITU-T
G.711 · G.718 · G.719 · G.722 · G.722.1 · G.722.2 · G.723 · G.723.1 · G.726 · G.728 · G.729 · G.729.1
Others
AC-3 · AMR · AMR-WB · AMR-WB+ · Apple Lossless · ATRAC · CELT · DRA · DTS · EVRC · EVRC-B · FLAC · GSM-HR · GSM-FR · GSM-EFR · iLBC · iSAC · Monkey's Audio · TTA (True Audio) · MT9 · A-law · μ-law · Musepack · Nellymoser · OptimFROG · OSQ · QCELP · RealAudio · RTAudio · SD2 · SHN · SILK · Siren · SMV · Speex · SVOPC · TwinVQ · VMR-WB · Vorbis · WavPack · WMA
Image
ISO/IEC/ITU-T
JPEG · JPEG 2000 · JPEG XR · lossless JPEG · JBIG · JBIG2 · PNG
Others
APNG · BMP · DjVu · EXR · GIF · ICER · ILBM · MNG · PCX · PGF · TGA · QTVR · TIFF · WBMP · WebP
Containers
ISO/IEC
MPEG-PS · MPEG-TS · ISO base media file format · MPEG-4 Part 14 · Motion JPEG 2000 · MPEG-21 Part 9
ITU-T
H.222.0 · T.802
Others
3GP and 3G2 · AMV · ASF · AIFF · AVI · AU · Bink · DivX Media Format · DPX · EVO · Flash Video · GXF · M2TS · Matroska · MXF · Ogg · QuickTime File Format · RealMedia · REDCODE RAW · RIFF · Smacker · MOD and TOD · VOB · WAV · WebM
See Compression methods for methods and Compression software implementations for codecs
v · d · eData compression software implementations
Archivers
with compression
(Comparison)
Free software
7-Zip · Ark · File Roller · FreeArc · Info-ZIP · KGB Archiver · PAQ · PeaZip · The Unarchiver · tar · UPX · Xarchiver · Zipeg
Freeware
DGCA · Filzip · IZArc · LHA · StuffIt Expander (decompression only) · TUGZip · UHarc/WinUHA · ZipGenius
Commercial
ARC · ALZip · Archive Utility · ARJ · BetterZip · JAR · MacBinary · PKZIP/SecureZIP · PowerArchiver · StuffIt · WinAce · WinRAR · WinZip
Lossless data compression*
Free software
bzip2 · compress · gzip · lzip · lzop · rzip · xz
Audio compression
(Comparison)
Lossy
Freeware Advanced Audio Coder (FAAC) · Helix DNA Producer · l3enc · LAME · TooLAME · libavcodec · libcelt · libspeex · Musepack · libvorbis · Windows Media Encoder
Lossless
FLAC · ALAC · libavcodec · Monkey's Audio · TTA (True Audio) · mp4als · OptimFROG · WavPack
Video compression
(Comparison)
Lossy
MPEG-4 ASP
3ivx · DivX · Nero Digital · FFmpeg · HDX4 · Xvid
H.264/MPEG-4 AVC
CoreAVC · Blu-code · DivX · FFmpeg · Nero Digital · QuickTime · x264
Others
CineForm · Cinepak · DNxHD · Helix DNA Producer · Indeo · libavcodec · Schrödinger (Dirac) · SBC · Sorenson · VP7 · libtheora · libvpx · Windows Media Encoder
Lossless
FFV1 · Huffyuv · Lagarith · MSU Lossless · SheerVideo
* Non-archiving
See Compression methods for methods and Compression formats for formats
eSilo’s SiloSphere Backup and Recovery Software ‘Reduplicates’ Backup Data For Optimal Recovery Times
JUPITER, Fla.--(BUSINESS WIRE)--eSilo, a leading provider of comprehensive online data backup, storage solutions and data management services, today announced SiloSphere™, a groundbreaking global backup and disaster recovery software technology designed and optimized for the era of cloud computing, offering midrange users for the first time enterprise-class functionality and performance ...
Data compression: Definition from Answers.com
data compression ( ′dadə kəm′preshən ) ( computer science ) The technique of reducing the number of binary digits required to represent
GMPCS to Offer Speedmail Satellite Phone Email Service and “The Box” Satellite Router with Access Control, developed ...
New services to save money and gain more productivity from satellite airtime. (PRWeb January 18, 2011) Read the full story at http://www.prweb.com/releases/2011/1/prweb8064162.htm
Lossless data compression - Wikipedia, the free encyclopedia
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. ...
Oracle Unveils 5TB Tape Drive
The StorageTek T10000C is built for the SL8500 Modular Library System, which can be expanded to 1 exabyte of storage for archiving and backup.
Lossless data compression: Definition from Answers.com
lossless data compression ( ¦lös′les ′dadə kəm′preshən ) ( communications ) Data compression in which the recovered data are assured to be
Actifio to Showcase Industry's First Data Management Virtualization Solution at VMware User Group
WALTHAM, Mass. – Actifio™, the leader in Data Management Virtualization (DMV), today announced that the company will showcase the industry’s first single, virtualized data management solution for backup, disaster recovery and business continuity at the New England VMware User Group (VMUG) meeting being held on Jan. 20, 2011 at Gillette Stadium in Foxborough, Mass.
Data Compression
This paper surveys a variety of data compression methods spanning almost forty years of research, from the ... Data compression has important application in the areas of file ...
BridgeSTOR Unveils First Appliance to Bring Deduplication and Compression to Microsoft DPM
New Application Optimized Storage Appliance Reduces Microsoft Data Protection Manager Storage Requirements by Up to 90%
Demistifying JPEGs, MPEGs, MP3s
It's easy to get hung up on hardware. After all you can hold a camera or a music player in your hand, but you can't hold the software that makes it tick. Yet these devices likely wouldn't exist without data compression software developed over the past three decades.
Data Compression Explained
Data compression is the art of reducing the number of bits needed to store or ... All data compression algorithms consist of at least a model and a ...
Calpont InfiniDB Shines in Data Warehouse Benchmark Against Row Based Database
Calpont InfiniDB Clearly Demonstrates its Advantages and Value Over Row-Oriented Database Technology (PRWeb January 26, 2011) Read the full story at http://www.prweb.com/releases/2011/01/prweb4986504.htm
PKWARE, Inc. - Data Security and File Compression Solutions
Data security software from PKWARE secures data files at rest and in transit with passphrase or certificate-based encryption and cross-platform capability
BridgeStor Appliance to Bring Deduplication, Compression to Microsoft DPM
System Center Data Protection Manager provides data protection for Microsoft Windows Server, Exchange, SQL, SharePoint and Hyper-V environments.
LZW Data Compression
This article describes how LZW data compression works, gives a little bit of background on where it came from, and provides some working C code so ...
Data Archiving, Purging and Retrieval Methods for Enterprises
Enterprise growth leads to information explosion and vice versa. With various types of archiving, purging and retrieval methods available, the question becomes, 'What are the advantages/disadvantages in archiving and purging data.' Read on to learn the pros and cons of the different methods.
can compress the backup data to save space on the storage medium If you wish to do so check the box Compress data to save space If you do not want to compress data leave it unchecked The second checkbox in this window is Encrypt the backup data for security reasons When checked this option will apply encryption to the backup data which is needed to protect it from
http://www.office-backup.com/help/help_4_1_5.html
Data Compression and Database Performance
Data compression is widely used in data management to save storage space and network ... Data compression is an effective means for saving storage space and ...
nuBridges Exchange MFT Solutions Earn Drummond Certified Seal for AS2
ATLANTA--(BUSINESS WIRE)--nuBridges, the secure eBusiness authority, announced today that nuBridges Exchange™ Commerce Suite v3.5 and nuBridges Exchange i™ v3.2 Managed File Transfer (MFT) software solutions for B2B integration have completed the AS2–3Q10 Interoperability certification and are now Drummond Certified®. This certification ensures that these nuBridges software applications ...




















