Open In App

Introduction to Data Compression

Last Updated : 27 Jul, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss the overview of Data Compression and will discuss its method illustration, and also will cover the overview part entropy. Let’s discuss it one by one. 

Overview :
One important area of research is data compression. It deals with the art and science of storing information in a compact form. One would have noticed that many compression packages are used to compress files. Compression reduces the cost of storage, increases the speed of algorithms, and reduces the transmission cost. Compression is achieved by removing redundancy, that is repetition of unnecessary data. Coding redundancy refers to the redundant data caused due to suboptimal coding techniques.

Method illustration :

  • To illustrate this method let’s assume that there are six symbols, and binary code is used to assign a unique address to each of these symbols, as shown in the following table
  • Binary code requires at least three bits to encode six symbols. It can also be observed that binary codes 110 and 111 are not used at all. This clearly shows that binary code is not efficient, and hence an efficient code is required to assign a unique address. 
Symbols W1 W2 W3 W4 W5 W6
Probability 0.3 0.3 0.1 0.1 0.08 0.02
Binary code 000 001 010 011 100 101
  • An efficient code is one that uses a minimum number of bits for representing any information. The disadvantage of binary code is that it is fixed code; a Huffman code is better, as it is a variable code. 
  • Coding techniques are related to the concepts of entropy and information content, which are studied as a subject called information theory. Information theory also deals with uncertainty present in a message is called the information content. The information content is given as 
                                 log2 (1/pi) or -log2 pi . 

Entropy :

  • Entropy is defined as a measure of orderliness that is present in the information. It is given as follows:
                                    H= - ∑ pi log2 pi
  • Entropy is a positive quantity and specifies the minimum number of bits necessary to encode information. Thus, coding redundancy is given as the difference between the average number of bits used for coding and entropy. 
coding redundancy = Average number of bits - Entropy
  • By removing redundancy, any information can be stored in a compact manner. This is the basis of data compression.

Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads