Colorspace

Format    Description                                            Used by
 4:4:4    Full RGB (or YUV)                                      most editing software
 4:2:2    (YUY2) Full Y, one UV for every 2x1 block of pixels    some capture cards
 4:2:0    (YV12) Full Y, one UV for every 2x2 block of pixels    mpeg1/2/4
 4:1:1    (YV12) Full Y, one UV for every 4x1 block of pixels    DV

 4:4:4    YUV
 4:4:4    YUV 4:2:2 YUY2
 4:2:2    YUY2 4:2:0 YV12
 4:2:0    YV12 4:1:1 YV12
 4:1:1    YV12

4:4:4 YUV

4:2:2 YUY2

4:2:0 YV12

4:1:1 YV12

Colorspace refers to how many bits are allocated to the various color and intensity aspects of an image. Any color can be created from a mixture of only Red, Green, and Blue. Dark red plus dark blue makes a dark shade of magenta. Bright red plus bright green makes bright yellow, and so on… If you take a magnifying glass to your TV screen or computer monitor, you can see how this works first-hand. When we talk about a “24 bit RGB” (or “RGB24”) format, we mean that each pixel is described by 24 bits which determine what color and how bright it is. In the case of RGB24, there are 8 bits for Red, 8 for Green, and 8 for Blue.

At 24 bits per pixel and literally hundreds of thousands of pixels in a single frame of video, these bits add up quickly. And that’s just a single frame–multiply this by 24 frames a second in a typical movie and you are looking at a whole lot of information. Doing the math leads to about 1 MByte per frame, or about 167 GBytes for a 120 minute movie. That’s about 37 dvd’s. It would benefit us to be able to reduce that amount of information. And it would benefit us more if we could reduce it without our eyes noticing it.
The first step is to convert from the familiar RGB to YUV, which contains the same amount of information, but in a different format. RGB works in Red, Green, and Blue. YUV, instead, works in Luma (the Y), and Chroma (the U and the V). A pixel’s Luma value represents its intensity or brightness while its Chroma values, together, represent its color information. Were you to remove the chroma information completely, you’d be left with a black and white version of the image.
As it turns out, human eyes are far less sensitive to color than they are to brightness. This is the key to all the following colorspaces. Each one takes advantage of this trait of human vision and, in one way or another, deletes color information. The eye, unless one is looking for it, simply doesn’t notice it’s missing. In fact, every DVD you’ve ever watched actually has only a quarter of the color information that it originally started with, as we’ll soon see.
The general idea is that we keep all the Luma information but start getting rid of chroma information. The following descriptions treat this like we are simply deleting chroma from some pixels and keeping it for others. This mode of thinking is sufficient for now, but it should be mentioned that what’s really going on is a form of averaging. This will be discussed in detail later.
4:4:4

The ratio-style descriptions of the different colorspaces can be confusing. You can think of them as describing part of one horizontal line of an image at a time. The numbers are a ratio of Y to U to V information.
4:4:4 means that on any particular horizontal line, for every pixel, you have a Y, U, and V value. You could just as easily call this 1:1:1, but we multiply everything by 4 for consistancy (otherwise, we’d be dealing in fractions with the other formats…). A set of pixels in this format looks like:
YUV YUV YUV YUV . . .

YUV YUV YUV YUV . . .

YUV YUV YUV YUV . . .
. . . .
. . . .
. . . .

The YUV format is only a simple, mathematical transform away from being RGB and going between the two is lossless. There’s usually not much use in working in YUV unless you’re going to introduce some loss, as we do in the next few colorspaces. That being the case, many editing programs (Premiere, VirtualDub, etc…) work in RGB, but few use YUV.
4:2:2

Here we have removed some of the color information–half of it, in fact. For every 4 pixels, there are 4 Y values, but only 2 each of U and V. This is done by simply dropping every other chroma value from each pixel. Again, we can get away with this because human eyes simply don’t notice it. A set of pixels in this format would look like:
YU YV YU YV . . .

YU YV YU YV . . .

YU YV YU YV . . .
. . . .
. . . .
. . . .

This format has 1 set of chroma samples for each 2×1 block of pixels. It is sometimes used by capture cards and higher end video cameras. The HuffYUV codec can also use this format, but aside from that, it’s really not very common.
4:2:0

This description can be a little confusing unless you keep in mind that it’s 4:2:0 per line. While the first line may indeed contain no U components, the next one will have them (but it will have no V components) and the pattern alternates every other line… A set of pixels in this format would look like:
YV Y YV Y . . .

YU Y YU Y . . .

YV Y YV Y . . .

YU Y YU Y . . .
. . . .
. . . .
. . . .

This format has 1 set of chroma samples for each 2×2 block of pixels. This is an extremely common format–mpeg1, mpeg2, and mpeg4 all use this format. Since DVD’s use mpeg2, all DVD’s are also in this format. Many common video conferencing codecs use the mpeg standards, and so they too are in 4:2:0 colorspace.
4:1:1

This is an oddball format used only by North American (NTSC) DV tapes (the European standard uses 4:2:0 like normal people would) but since DV is a common home video format, it’s important to know. 4:1:1 has the usual 4 Y samples for every four pixels and again, like the 4:2:0 format, there is one complete set of Chroma samples for 4 pixels, but rather than being in a 2×2 block, they are in a 4×1 block. A set of pixels in this format looks like:
YUV Y Y Y . . .

YUV Y Y Y . . .

YUV Y Y Y . . .
. . . .
. . . .
. . . .

4:1:1, as the ratio suggests, has the same amount of information as 4:2:0–it’s only arranged differently. The end result is an odd horizontal ‘color stretching’ distortion. Most won’t notice it unless looking for it, but 4:2:0 is still better.
The Averaging

All of the above examples are mostly true… You can usually think of the grids of pixels just as you see them there and for the most part be right, but what’s actually happening is the chroma values are being averaged. The easiest way to explain this is with an example, so we’ll look first at 4:1:1 colorspace:
Let’s start with the following block of pixels:
Red LightRed LightGreen Green

If we convert this to 4:1:1, the above grids suggest that it may look like we end up with:
Red LightGrey LightGrey Grey

…but we don’t. What really happens is that all 4 will become the average of the colors-in this case, that’s yellow. Things aren’t quite as bad as they may sound because it’s only the Chroma that gets averaged out-the Luma values have not been averaged and so this particular block of pixels becomes:
Yellow LightYellow LightYellow Yellow

What we’ve done here is instead of keeping 4 UV pairs (one pair for each of our 4 pixels), we’ve taken the average of the 4 pairs and assigned that average value to all 4 pixels. There is now one pair of UV values applied across a 4×1 block.
Since 4 pixels is actually very very small, our eyes typically won’t notice when a tiny bit of green next to a tiny bit of red is replaced by two tiny bits of yellow, especially when their Luma values are still preserved. Most likely, our eyes would have averaged the two colors anyway. Again, take a magnifying glass to your computer monitor and look at something yellow… our eyes do this all the time. We’ve simply allowed the computer to do it first in order to save bits.
Now as for 4:2:0, it’s a little more convoluted. One color component is being completely deleted per line, but instead of averaging across four pixels, it’s only averaged across two. The next line does the same thing but with the other chroma value. Again, since a 2×2 block of pixels is very small, it’s difficult to see the difference and so the system works well.
In general, the results of all this averaging results in color “smearing.” In 4:2:2, colors will bleed horizontally across two pixels at a time. In 4:1:1 colors will bleed horizontally across four pixels at a time. And in 4:2:0, colors will bleed across a 2×2 box.
What About Interlacing?

Whether a video is interlaced or not has no effect on 4:4:4 or 4:2:2. It also has no effect on 4:1:1 for that matter, but it does make things a little screwy for 4:2:0. 4:2:0 sampling, on interlaced video, is done per field and not per frame, so you end up with fields that look like this:
Top Field Bottom Field

YV Y YV Y ||||||||||||||||||

|||||||||||||||||| YV Y YV Y

YU Y YU Y ||||||||||||||||||

|||||||||||||||||| YU Y YU Y

This odd interlacing effect is perhaps why NTSC DV uses 4:1:1.
Analog Sampling

Analog sampling is also hinted at with the 4:x:x ratios. A typical broadcast TV signal is 6 MHz bandwidth. Each chroma signal is approximately half that bandwidth, or about 3 MHz. Given the Nyquist rule of sampling at at least twice your signal bandwidth, we know that Y should be sampled at at least 12 MHz and the Chromas at 6 MHz.
Actual sampling is typically 13.5 MHz for Luma and 6.75 MHz for each chroma, which leaves a little overhead since twice the bandwidth is only the minimum theoretical sample frequency. This is denoted as 4:2:2. Look familiar? And now we also see why many TV capture cards produce 4:2:2 digital output–because the signal they receive is the analog equivalent of it.
A lower-end sampling system might sample Y at 13.5 MHz and each Chroma at 3.375 MHz. This is 4:1:1 sampling. Analog signals are sent line by line, so the chroma cannot be sampled across multiple lines like a 4:2:0 digital image is. As such, 4:1:1 blurs the chroma a lot horizontally, but in fact doesn’t hurt it at all vertically. Just like DV in the digital world. While it could conceivably commit each line to memory and then process the chroma vertically when the next line comes in, this is aparrently not worth the trouble as it’s not done. On the flip side of things, it is trivial to simply read the next line in a digital image (as it’s already in memory anyway) and so that’s why 4:2:0 is so common for digital video.

Meta