This is arguably the whole raison d’aitre for digital video, since the camera to screen process starts and finishes with analogue signals. Compression allows the video image to be transmitted in a much more efficient way than is possible with analogue, so that we can get either more quality or use less media or more typically a combination of both.
Essentially, compression allows the picture information to be selectively discarded with minimal effect on the apparent image quality. Simplistically, it does this in four stages:
1) The colour mapping is manipulated to minimise the number of samples required.
2) The remaining image sampling is ‘transformed’ to make it easier to lose data without quality. This is known as compression in the spatial domain.
3) Redundant information in the temporal domain is removed
4) Various tricks are used to utilise lossless compression of the remaining data.
The actual routines used are peculiar to each compression format, known as a ‘codec’. If we take the example of the HDV codec, then we can illustrate these steps with more detail.
1) The video signal is formed in the camera with red, green and blue channels but we then matrix the signals to create a luminance (Y) and colour difference (Cr, Cb) signals. The number of colour samples can then be reduced to a quarter of the luminance samples, because we are less sensitive to the colour resolution. Overall, we also reduce the samples by using a squashed (anamorphic) version of the image, whereby we use 1440 luminance samples to describe a 1920 wide image.
2) The image is formed (by the CCD) as a sample of each pixel making up the picture. In the HDV format (and most other video codecs) we perform a conversion to the DCT (Direct Cosine Transform) version of the image. This is a mathematical trick to allow easier subsequent information loss with minimal quality change. The DCT image is formed by a series of digital parameters describing 16x16 pixel blocks, and we can manipulate these parameters to lose bits with little apparent effect on the picture quality.
3) Camera images have a great deal of the picture content in common from frame to frame. It is possible to make use of this redundancy to minimise the required storage, by compressing over a Group of Pictures (GOP) sequence which for HDV is 12(25fps) or 16(30fps).
4) The final compressed digital signal is recorded to tape using techniques such as ‘Run Time Encoding’ to efficiently store the data with extra lossless compression.
The end result is that the HDV signal is recording about 1180Mb/s of original image data onto 25Mb/s on tape. This represents an overall reduction of about 47:1. In practice the initial sample reduction means the actual digital compression is only about 18:1, but still very impressive considering the good subjective quality.
For the real geeks amongst you, here is the way that we actually reduce data in the temporal domain over the designated GOP.
We first of all take a reference ‘I’ frame. In the camera we then track the movement of groups of pixels from frame to frame in the GOP, to form ‘vectors’ which try to define these movements. We then use these vectors to synthesize the following frames to the I frame in the GOP. Each of these synthetic frames is then compared to the real frames to form ‘error’ frames, and it is these that are actually recorded along with the vector data and of course the reference I frame. At the decoding stage, we can generate the same synthetic frames as the camera and use the error frames data to improve the image. Clearly the more accurate and comprehensive is the vector information, then the less ‘error’ data needs to be sent. It is this fact that principally makes later codecs like H264 more efficient than older codecs like Mpeg. It is a little more complicated in practice, for instance to allow for bidirectional decoding. The main stumbling block to advancing codecs is the required processor power, which currently dictates purpose designed hardware chips and therefore fairly fixed formats. The future will allow more general purpose processing chips and therefore more freedom of codec choice to suit, as is current multimedia practice on computers.