# Data Compression - Systematisation

## ---- Coding ----

#### by T.Strutz

back to Start        previous next

Figure 2: Coding comprises all techniques aiming to a reduction of redundancy in the signal. The amount of redundancy inside a signal depends on the distribution of the signal values and the dependencies between them. Thus, uniform distributed white noise, for instance, does not contain redundancy. Coding techniques are reversible, which means the operations can be inverted without loss of digital information (identity between original an reconstructed signal). Coding is based on the information content I(si) of symbols si

I(si) = log2(1/pi)    [bit]

with pi as the probability of occurrence. Typically, the mean information content over all symbols is of more interest. It is called entropy H. The entropy states a minimum (fractional) number of bits, which are necessary on the average to transmit one symbol.

H = i pi · log2(1/pi) = - i pi · log2(pi)    [bit / symbol]

However, this theoretical limit is only valid if the different symbols of the symbol alphabet are independent on each other. Should statistical relations exist between them, it is possible to compress the signal to a lower bitrate. Thus, the coding techniques fall into two sub-categories.

If no dependencies exist or possible correlations are not considered, respectively, we talk about entropy coding (see figure 2). The goal of entropy coding is to meet a bitrate which is as close as possible to the entropy H.

Otherwise, if we expect correlation between symbols (p(si|sj)≠p(si)), we apply precoding algorithms exploiting the dependencies of symbols to improve the compression ratio. Typically, symbols of the original alphabet are mapped to symbols of another alphabet aiming to lower the product of the entropy H and the number of symbols N, which have to be transmitted. Generally, the entropy of the new alphabet is higher than the entropy of the original, but the N decreases all the more.

Strutz / 17.02.2003 