A Q&A written by our camera engineer, Perry Mitchell, with a link to a much more detailed analysis.
There is no easy way to understand video compression – it is a very complicated subject. For those folks that want to know more (all be it simplified) there is a link here to a longer essay. For those that just want to know the answers to some obvious questions, then here you go:
Q1 - Why do we need to have digital compression?
A1 – Because we otherwise have too much data for current media capabilities. A full frame of un-compressed HD video is 62,208,000 bits of data. At 25 frames per second that would represent a data rate of about 185MB/s which is way more than most of our common media types can manage. HDCam (basically DigiBeta) tape provides about 18MB/s, DVCPRO-HD about 12.5MB/s. (Note these are Mbytes/s, for Mbit/s, multiply by 8.) Solid State cards are typically less than this.
Q2 – What is sub-sampling and how does that effect compression?
A2 - The above full frame assumes that the three (green, red and blue) analogue video signals are each digitally sampled at 10 bit accuracy. The first step in reducing the data size is to reduce the number of samples. Some video formats are happy with 8 bit accuracy. It is also common for all the video channels to be represented by a sub-sample of the native width, so that (for instance) 1920 image pixels get recorded as 1440 data samples. There is also a well proven technique of changing the R, G, B channels to ‘colour difference’ channels (Y, Cr, Cb). We can then withstand lower sampling accuracy on the colour channels (Cr, Cb) than the luminance (Y). This is normally expressed as 4:2:2 (half colour sampling) or 4:1:1/4:2:0 (quarter colour sampling). In the latter case we have reduced the overall data size by half with little visible effect on the picture quality for normal type scenes. There is some disagreement as to whether sub-sampling is in itself part of compression.
Q3 – What is the basic principle of video compression?
A3 – We make use of information redundancy! Basically this means that the normal method of scanning the picture and repeating it many times a second leads to a signal that includes a lot of surplus information. The main problem is that in its raw sampled state, there is no easy way to separate the surplus information from that essential to providing a good picture. In order to make this easier, we need to change the way the picture is described. This is done with a mathematical ‘Transform’. The commonest is that used in JPEG/MPEG and many other video codecs, called a Direct Cosine Transform or DCT. The result describes the picture in terms of frequency components and these are far easier to manipulate in a compression algorithm (posh word for ‘method’)
Q4 – What is the visible effect of DCT compression?
A4 – Ideally there is no visible effect! It is what we call a transparent codec. In practice, there are two results that commonly occur. Firstly the DCT transform results in secondary artifacts called ‘mosquito trails’ and similar which are clearly visible to the viewer since they are not ‘natural’ looking. Secondly, the picture needs to be encoded in blocks (usually 16x16 pixels) and differences between compression in each block can result in the blocks themselves becoming visible, a process sometimes called ‘quilting’. At the levels of DCT compression used in professional video formats, these effects are rarely visible.
Q5 – What is the difference between Interframe and Intraframe compression?
A5 – The above mentioned compression method is based on a single frame, or ‘Intraframe’; and is based in what we call the ‘spacial domain’. In practice the video signal has a lot of information redundancy in the ‘temporal domain’ or over a series of frames, and we make use of this by utilising ‘Interframe’ compression. The actual number of frames used in the series is called a ‘Group of Pictures’ or GOP. To get good efficiency, we can make this typically 12 or 16 frames, and this is then known as ‘Long GOP’ compression.
Q6 – What is the basic method of utilising Interframe compression?
A6 – We send an initial reference frame, compressed using only Intraframe techniques; this is known as an ‘I frame’. We then send the rest of the frames in the GOP using some form of difference information from the I frame. The exact form of these difference frames will change with the actual codec used. MPEG uses a technique whereby pixel blocks in the picture are tracked to form ‘Vectors’. At the encoding end, the I frame is processed with the generated Vectors to synthesise the subsequent frames and then these are compared with the actual frames to produces error difference frames. These are added to the I frame and the Vector information to basically make up the compressed video data. Obviously the more accurate is the Vector information then the smaller is the error data and thus the subsequent data that needs to be saved. The (improvement of) generation of the Vector information is the main difference between the various developing MPEG schemes.
For a more detailed downloadable document Click Here