In order to exchange images between drivers and applications, it is necessary to have standard image data formats which both sides will interpret the same way. V4L2 includes several such formats, and this document is intended to be an unambiguous specification for the standard image data formats in V4L2.
V4L2 drivers are not limited to these formats, however. Driver-specific formats are possible. In that case the application may depend on a codec driver to convert images to one of the standard formats when needed. But the data can still be stored and retreived in the proprietary format. For example, a device may support a proprietary compressed format. Applications can still capture and save the data in the compressed format, saving much disk space, and later use a codec device driver to convert the images to the X Windows screen format when the video is to be displayed.
Even so, ultimately, some standard formats are needed, so the V4L2 specification
would not be complete without well-defined standard formats.
The pixels are always arranged in memory from left to right, and from top to bottom. The first byte of data in the image buffer is always for the leftmost pixel of the topmost row. Following that is the pixel immediately to its right, and so on until the end of the top row of pixels. Following the rightmost pixel of the row there may be zero or more bytes of padding to guarantee that each row of pixel data has a certain alignment. Following the pad bytes, if any, is data for the leftmost pixel of the second row from the top, and so on. The last row has just as many pad bytes after it as the other rows.
The formats fall into two broad categories, the RGB formats and YUV formats. The YUV formats all use the YCbCr color space used in the ITU-R601 and ITU-R656 digital video standards. There is more information about the YCbCr color space later in this document.
In V4L2, each format has an identifier which looks like PIX_FMT_XXX, defined in videodev.h.
The rest of this document describes each standard format.
In order to make the specifications endianness independent, the following
diagrams show the order of the data in memory on a byte by byte basis. Each
cell of the diagrams is one byte. The bytes are arranged in memory from
left to right, top to bottom. Possible pad bytes after each row are not
shown.
V4L2_PIX_FMT_RGB555,
V4L2_PIX_FMT_RGB565
p00 | q00 | p01 | q01 | p02 | q02 | p03 | q03 |
p10 | q10 | p11 | q11 | p12 | q12 | p13 | q13 |
p20 | q20 | p21 | q21 | p22 | q22 | p23 | q23 |
p30 | q30 | p31 | q31 | p32 | q32 | p33 | q33 |
Each pixel is two bytes, denoted here as p and q. For RGB
5-5-5, each pair of bytes contains five bits of red, five bits of green,
five bits of blue, and one extra bit. The value of the extra bit is undefined.
For RGB 5-6-5 there are six green bits and no extra bits. The RGB bits are
arranged in p and q like this:
bit | (MSB) 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 (LSB) |
p | G2 | G1 | G0 | R4 | R3 | R2 | R1 | R0 |
q | ? | B4 | B3 | B2 | B1 | B0 | G4 | G3 |
bit | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
p | G2 | G1 | G0 | R4 | R3 | R2 | R1 | R0 |
q | B4 | B3 | B2 | B1 | B0 | G5 | G4 | G3 |
V4L2_PIX_FMT_RGB555X,
V4L2_PIX_FMT_RGB565X
p00 | q00 | p01 | q01 | p02 | q02 | p03 | q03 |
p10 | q10 | p11 | q11 | p12 | q12 | p13 | q13 |
p20 | q20 | p21 | q21 | p22 | q22 | p23 | q23 |
p30 | q30 | p31 | q31 | p32 | q32 | p33 | q33 |
Each pixel is two bytes, denoted here as p and q. For RGB
5-5-5, each pair of bytes contains five bits of red, five bits of green,
five bits of blue, and one extra bit. The value of the extra bit is undefined.
For RGB 5-6-5 there are six green bits and no extra bits. These RGB555X
and RGB565X are the same as RGB555 and RGB565, except the bytes are swapped
in each pixel. The RGB bits are arranged in p and q like this:
bit | (MSB) 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 (LSB) |
p |
? | B4 | B3 | B2 | B1 | B0 | G4 | G3 |
q |
G2 | G1 | G0 | R4 | R3 | R2 | R1 | R0 |
bit | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
p | B4 | B3 | B2 | B1 | B0 | G5 | G4 | G3 |
q | G2 | G1 | G0 | R4 | R3 | R2 | R1 | R0 |
V4L2_PIX_FMT_BGR24
B00 | G00 | R00 | B01 | G01 | R01 | B02 | G02 | R02 | B03 | G03 | R03 |
B10 | G10 | R10 | B11 | G11 | R11 | B12 | G12 | R12 | B13 | G13 | R13 |
B20 | G20 | R20 | B21 | G21 | R21 | B22 | G22 | R22 | B23 | G23 | R23 |
B30 | G30 | R30 | B31 | G31 | R31 | B22 | G32 | R32 | B33 | G33 | R33 |
Each pixel is three bytes. B is first, then G then R.
V4L2_PIX_FMT_RGB24
R00 | G00 | B00 | R01 | G01 | B01 | R02 | G02 | B02 | R03 | G03 | B03 |
R10 | G10 | B10 | R11 | G11 | B11 | R12 | G12 | B12 | R13 | G13 | B13 |
R20 | G20 | B20 | R21 | G21 | B21 | R22 | G22 | B22 | R23 | G23 | B23 |
R30 | G30 | B30 | R31 | G31 | B31 | R22 | G32 | B32 | R33 | G33 | B33 |
Each pixel is three bytes. R is first, then G then B.
V4L2_PIX_FMT_BGR32
B00 | G00 | R00 | ? | B01 | G01 | R01 | ? | B02 | G02 | R02 | ? | B03 | G03 | R03 | ? |
B10 | G10 | R10 | ? | B11 | G11 | R11 | ? | B12 | G12 | R12 | ? | B13 | G13 | R13 | ? |
B20 | G20 | R20 | ? | B21 | G21 | R21 | ? | B22 | G22 | R22 | ? | B23 | G23 | R23 | ? |
B30 | G30 | R30 | ? | B31 | G31 | R31 | ? | B22 | G32 | R32 | ? | B33 | G33 | R33 | ? |
Each pixel is four bytes. B is first, then G then R, then an extra byte.
The value of the extra byte is undefined.
V4L2_PIX_FMT_RGB32
R00 | G00 | B00 | ? | R01 | G01 | B01 | ? | R02 | G02 | B02 | ? | R03 | G03 | B03 | ? |
R10 | G10 | B10 | ? | R11 | G11 | B11 | ? | R12 | G12 | B12 | ? | R13 | G13 | B13 | ? |
R20 | G20 | B20 | ? | R21 | G21 | B21 | ? | R22 | G22 | B22 | ? | R23 | G23 | B23 | ? |
R30 | G30 | B30 | ? | R31 | G31 | B31 | ? | R22 | G32 | B32 | ? | R33 | G33 | B33 | ? |
Each pixel is four bytes. R is first, then G then B, then an extra byte.
The value of the extra byte is undefined.
V4L2_PIX_FMT_RGB332
p00 | p01 | p02 | p03 |
p10 | p11 | p12 | p13 |
p20 | p21 | p22 | p23 |
p30 | p31 | p32 | p33 |
Each pixel is one byte. This format is intended for use with 8-bit colormap
displays. Each byte contains three bits of red, three bits of green, and
two bits of blue. The RGB bits are arranged in the bytes like this:
bit | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
p |
B1
|
B0 | G2 | G1 | G0 | R2 | R1 | R0 |
Y = (255/219)(Y - 16) |
U = (127/112)(Cb - 128) |
V = (127/112)(Cr - 128) |
That gives a Y as 0...255, and U and V as -127...+127. Convert to RGB:
R = Y + 1.402V |
G = Y - 0.344U - 0.714V |
B = Y + 1.772U |
If you are writing a color space conversion routine take note: Due to image filtering, brightness controls, and other common video operations, it is normal that YCbCr values can go out of range. It is also normal for the computed R, G, or B values to be below 0 or above 255, even if YCbCr were in their legal range. It is necessary for a conversion algorithm to clamp all the result values to their legal range.
The inverse transform to convert RGB into YCbCr can be derived (how are
your linear algebra skills?) and is as follows:
Y = 0.2990R + 0.5670G + 0.1140B |
U = -0.1687R - 0.3313G + 0.5000B |
V = 0.5000R - 0.4187G - 0.0813B |
Which gives Y as 0...255, and U and V as -127 to 127. Then convert to YCbCr
ranges:
Y = (219/255)Y + 16 |
Cb = (112/127)U + 128 |
Cr = (112/127)V + 128 |
The purpose of using this color space is to separate the brightness information
(Y) from the color information (U and V or Cb and Cr). It is a property
of the human visual system that brightness information is more important,
and color information can be partially discarded with little loss of perceptual
quality. Therefore the YUV formats always use fewer Cb's and Cr's than
Y's. There is always one Y per pixel. The YUV formats differ by how much
color information is discarded, and by how the Y's, Cb's and Cr's are arranged
in memory.
V4L2_PIX_FMT_YUYV,
V4L2_PIX_FMT_UYVY
V4L2_PIX_FMT_VYUY
V4L2_PIX_FMT_YVYU
Y00 | Cb00 | Y01 | Cr00 | Y02 | Cb02 | Y03 | Cr02 |
Y10 | Cb10 | Y11 | Cr10 | Y12 | Cb12 | Y13 | Cr12 |
Y20 | Cb20 | Y21 | Cr20 | Y22 | Cb22 | Y23 | Cr22 |
Y30 | Cb30 | Y31 | Cr30 | Y32 | Cb32 | Y33 | Cr32 |
In these formats each four bytes is two pixels. Each four bytes is two
Y's, a Cb and a Cr. Each Y goes to one of the pixels, and the Cb and Cr
belong to both pixels. As you can see, the Cr and Cb components have half
the horizontal resolution of the Y component. V4L2_PIX_FMT_UYVY is the same,
except the data are arranged in a different order: Cb-Y-Cr-Y. V4L2_PIX_FMT_YUYV
is known in the Windows environment as YUY2. Similarly, V4L2_PIX_FMT_VYUY
uses byte order Cr-Y-Cb-Y, and V4L2_PIX_FMT_YVYU is Y-Cr-Y-Cb.
V4L2_PIX_FMT_Y41P
Cb00 | Y00 | Cr00 | Y01 | Cb04 | Y02 | Cr04 | Y03 | Y04 | Y05 | Y06 | Y07 |
Cb10 | Y10 | Cr10 | Y11 | Cb14 | Y12 | Cr14 | Y13 | Y14 | Y15 | Y16 | Y17 |
Cb20 | Y20 | Cr20 | Y21 | Cb24 | Y22 | Cr24 | Y23 | Y24 | Y25 | Y26 | Y27 |
Cb30 | Y30 | Cr30 | Y31 | Cb34 | Y32 | Cr34 | Y33 | Y34 | Y35 | Y36 | Y37 |
In this format each 12 bytes is eight pixels. In the twelve bytes are two
CbCr pairs and eight Y's. The first CbCr pair goes with the first four Y's,
and the second CbCr pair goes with the other four Y's. The Cb and Cr components
have one fourth the horizontal resolution of the Y component.
V4L2_PIX_FMT_YVU420,
V4L2_PIX_FMT_YUV420
Y00 | Y01 | Y02 | Y03 |
Y10 | Y11 | Y12 | Y13 |
Y20 | Y21 | Y22 | Y23 |
Y30 | Y31 | Y32 | Y33 |
Cr00 | Cr02 |
Cr20 | Cr22 |
Cb00 | Cb02 |
Cb20 | Cb22 |
These are planar formats, as opposed to a packed format. The three components are separated into three sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. For V4L2_PIX_FMT_YVU420, the Cr plane immediately follows the Y plane in memory. The Cr plane is half the width and half the height of the Y plane (and of the image). Each Cr belongs to four pixels, a two-by-two square of the image. For example, Cr00 belongs to Y00, Y01, Y10, and Y11. Following the Cr plane is the Cb plane, just like the Cr plane. V4L2_PIX_FMT_YUV420 is the same except the Cb plane comes first, then the Cr plane.
If the Y plane has pad bytes after each row, then the Cr and Cb planes
have half as many pad bytes after their rows. In other words, two Cx
rows (including padding) is exactly as long as one Y row (including
padding).
V4L2_PIX_FMT_YVU410,
V4L2_PIX_FMT_YUV410
Y00 | Y01 | Y02 | Y03 |
Y10 | Y11 | Y12 | Y13 |
Y20 | Y21 | Y22 | Y23 |
Y30 | Y31 | Y32 | Y33 |
Cr00 |
Cb00 |
This is a planar format, as opposed to a packed format. The three components are separated into three sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. For V4L2_PIX_FMT_YVU410, the Cr plane immediately follows the Y plane in memory. The Cr plane is ¼ the width and ¼ the height of the Y plane (and of the image). Each Cr belongs to 16 pixels, a four-by-four square of the image. Following the Cr plane is the Cb plane, just like the Cr plane. V4L2_PIX_FMT_YUV410 is the same, except the Cb plane comes first, then the Cr plane.
If the Y plane has pad bytes after each row, then the Cr and Cb planes
have ¼ as many pad bytes after their rows. In other words, four C
x rows (including padding) is exactly as long as one Y row (including
padding).
V4L2_PIX_FMT_YUV422P
Y00 | Y01 | Y02 | Y03 |
Y10 | Y11 | Y12 | Y13 |
Y20 | Y21 | Y22 | Y23 |
Y30 | Y31 | Y32 | Y33 |
Cb00 | Cb02 |
Cb10 | Cb12 |
Cb20 | Cb22 |
Cb30 | Cb32 |
Cr00 | Cr02 |
Cr10 | Cr12 |
Cr20 | Cr22 |
Cr30 | Cr32 |
This format is not commonly used. This is a planar version of the YUYV format. The three components are separated into three sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. The Cb plane immediately follows the Y plane in memory. The Cb plane is half the width of the Y plane (and of the image). Each Cb belongs to two pixels. For example, Cb00 belongs to Y00, Y01. Following the Cb plane is the Cr plane, just like the Cb plane.
If the Y plane has pad bytes after each row, then the Cr and Cb planes
have half as many pad bytes after their rows. In other words, two Cx
rows (including padding) is exactly as long as one Y row (including
padding).
V4L2_PIX_FMT_YUV411P
Y00 | Y01 | Y02 | Y03 |
Y10 | Y11 | Y12 | Y13 |
Y20 | Y21 | Y22 | Y23 |
Y30 | Y31 | Y32 | Y33 |
Cb00 |
Cb10 |
Cb20 |
Cb30 |
Cr00 |
Cr10 |
Cr20 |
Cr30 |
This format is not commonly used. This is a planar format similar to the 422 planar format except with half as many chroma. The three components are separated into three sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. The Cb plane immediately follows the Y plane in memory. The Cb plane is ¼ the width of the Y plane (and of the image). Each Cb belongs to 4 pixels all on the same row. For example, Cb00 belongs to Y00, Y01, Y02 and Y03. Following the Cb plane is the Cr plane, just like the Cb plane.
If the Y plane has pad bytes after each row, then the Cr and Cb planes
have ¼ as many pad bytes after their rows. In other words, four C
x rows (including padding) is exactly as long as one Y row (including
padding).
V4L2_PIX_FMT_NV12
Y00 | Y01 | Y02 | Y03 |
Y10 | Y11 | Y12 | Y13 |
Y20 | Y21 | Y22 | Y23 |
Y30 | Y31 | Y32 | Y33 |
Cb00 | Cr00 | Cb02 | Cr02 |
Cb20 | Cr20 | Cb20 | Cr22 |
This is a two-plane version of the YUV420 format. The three components are separated into two sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. Immediately following that in memory is a combined CbCr plane. The CbCr plane is the same width, in bytes, as the Y plane (and of the image), but is half as tall. Each CbCr pair belongs to four pixels. For example, Cb00/Cr00 belongs to Y00, Y01, Y10, Y11.
If the Y plane has pad bytes after each row, then the CbCr plane has as many pad bytes after its rows.
V4L2_PIX_FMT_GREY
Y00 | Y01 | Y02 | Y03 |
Y10 | Y11 | Y12 | Y13 |
Y20 | Y21 | Y22 | Y23 |
Y30 | Y31 | Y32 | Y33 |
This is a greyscale (black and white) image. It is really a degenerate
YCbCr format which simply contains no Cr or Cb data. Y ranges from 16 (darkest)
to 235 (lightest).