This topic describes the 8-bit YUV color formats that are recommended for video rendering in the Windows operating system. This article is intended for anyone working with YUV video decoding or rendering in Windows.
Subscribe to RSS
Numerous YUV formats are defined throughout the video industry. This article identifies the 8-bit YUV formats that are recommended for video rendering in Windows. Decoder vendors and display vendors are encouraged to support the formats described in this article.
This article does not address other uses of YUV color, such as still photography. The formats described in this article all use 8 bits per pixel location to encode the Y channel also called the luma channeland use 8 bits per sample to encode each U or V chroma sample. However, most YUV formats use fewer than 24 bits per pixel on average, because they contain fewer samples of U and V than of Y.
This article does not cover YUV formats with bit or higher Y channels.
For the purposes of this article, the term U is equivalent to Cb, and the term V is equivalent to Cr. Chroma channels can have a lower sampling rate than the luma channel, without any dramatic loss of perceptual quality. The following diagrams shows how chroma is sampled for each of the downsampling rates.
Luma samples are represented by a cross, and chroma samples are represented by a circle. There are two common variants of sampling. For this reason, the MPEG-2 scheme is preferred in Windows, and should be considered the default interpretation of formats. This section describes the 8-bit YUV formats that are recommended for video rendering. These fall into several categories:.
This is a packed format, where each pixel is encoded as four consecutive bytes, arranged in the sequence shown in the following illustration. Both are packed formats, where each macropixel is two pixels encoded as four consecutive bytes. This results in horizontal downsampling of the chroma by a factor of two. In YUY2 format, the data can be treated as an array of unsigned char values, where the first byte contains the first Y sample, the second byte contains the first U Cb sample, the third byte contains the second Y sample, and the fourth byte contains the first V Cr sample, as shown in the following diagram.
NV12 yuv pixel format
It is expected to be an intermediate-term requirement for DirectX VA accelerators supporting video. This format is the same as the YUY2 format except the byte order is reversed—that is, the chroma and luma bytes are flipped Figure 4. Both of these YUV formats are planar formats. The chroma channels are subsampled by a factor of two in both the horizontal and vertical dimensions. All of the Y samples appear first in memory as an array of unsigned char values.
This is followed by all of the V Cr samples, and then all of the U Cb samples. The V and U planes have the same stride as the Y plane, resulting in unused areas of memory, as shown in Figure 5. The U and V planes must start on memory boundaries that are a multiple of 16 lines.
Figure 5 shows the origin of U and V for a x video frame. The starting address of the U and V planes are calculated as follows:. This format is identical to IMC1, except the U and V planes are swapped, as shown in the following diagram. In all of these formats, the chroma channels are subsampled by a factor of two in both the horizontal and vertical dimensions. In other words, each full-stride line in the chroma area starts with a line of V samples, followed by a line of U samples that begins at the next half-stride boundary Figure 7.
This layout makes more efficient use of address space than IMC1. It cuts the chroma address space in half, and thus the total address space by 25 percent. The following image illustrates this process.YUV is a color space typically used as part of a color image pipeline. It encodes a color image or video taking human perception into account, allowing reduced bandwidth for chrominance components, thereby typically enabling transmission errors or compression artifacts to be more efficiently masked by the human perception than using a "direct" RGB-representation.
Color information U and V was added separately via a sub-carrier so that a black-and-white receiver would still be able to receive and display a color picture transmission in the receiver's native black-and-white format.
The reason for this is that by grouping the U and V values together, the image becomes much more compressible. All of the Y samples appear first in memory as an array of unsigned char values. This array is followed immediately by all of the V Cr samples. The stride of the V plane is half the stride of the Y plane, and the V plane contains half as many lines as the Y plane. The V plane is followed immediately by all of the U Cb samples, with the same stride and number of lines as the V plane Figure Figure However, there are only one fourth as many U and V values.
The U and V values correspond to each 2 by 2 block of the image, meaning each U and V entry applies to four pixels.When a video image is stored in memory, the memory buffer might contain extra padding bytes after each row of pixels.
The padding bytes affect how the image is stored in memory, but do not affect how the image is displayed. The stride is the number of bytes from one row of pixels in memory to the next row of pixels in memory. Stride is also called pitch. If padding bytes are present, the stride is wider than the width of the image, as shown in the following illustration.
Two buffers that contain video frames with equal dimensions can have two different strides. If you process a video image, you must take the stride into account. In addition, there are two ways that an image can be arranged in memory. In a top-down image, the top row of pixels in the image appears first in memory.
In a bottom-up image, the last row of pixels appears first in memory. The following illustration shows the difference between a top-down image and a bottom-up image. A bottom-up image has a negative stride, because stride is defined as the number of bytes need to move down a row of pixels, relative to the displayed image. YUV images should always be top-down, and any image that is contained in a Direct3D surface must be top-down.
RGB images in system memory are usually bottom-up. Video transforms in particular need to handle buffers with mismatched strides, because the input buffer might not match the output buffer. For example, suppose that you want to convert a source image and write the result to a destination image. Assume that both images have the same width and height, but might not have the same pixel format or the same image stride.
The following example code shows a generalized approach for writing this kind of function. This is not a complete working example, because it abstracts many of the specific details. The general idea is to process one row at a time, iterating over each pixel in the row. Not every pixel format has a predefined structure. At the start of each row, the function stores a pointer to the row. At the end of the row, it increments the pointer by the width of the image stride, which advances the pointer to the next row.
This example calls a hypothetical function named TransformPixelValue for each pixel. This could be any function that calculates a target pixel from a source pixel. Of course, the exact details will depend on the particular task. For example, if you have a planar YUV format, you must access the chroma planes independently from the luma plane; with interlaced video, you might need to process the fields separately; and so forth. This example shows how to handle a planar YUV format.
YV12 is a planar format. In this example, the function maintains three separate pointers for the three planes in the target image. However, the basic approach is the same as the previous example.Image format. Because the image is compressed, the Stride property of the Image is not applicable.
Each MJPG encoded image in a stream may be of differing size depending on the compression efficiency. NV12 images separate the luminance and chroma data such that all the luminance is at the beginning of the buffer, and the chroma lines follow immediately after.
Stride indicates the length of each line in bytes and should be used to determine the start location of each line of the image in memory.
Chroma has half as many lines of height and half the width in pixels of the luminance. Each chroma line has the same width in bytes as a luminance line. Each pixel of BGRA32 data is four bytes. The first three bytes represent Blue, Green, and Red data. The Azure Kinect device does not natively capture in this format. Requesting images of this format requires additional computation in the API. The unit of the data is in millimeters from the origin of the camera. Image type IR Each pixel of IR16 data is two bytes of little-endian unsigned depth data.
The value of the data represents brightness.
This format represents infrared light and is captured by the depth camera. ImageFormat strong. YUY2 stores chroma and luminance data in interleaved pixels. Used in conjunction with user created images or images packing non-standard data.
The dark mode beta is finally here. Change your preferences any time. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. The issue is the undefined pixel column on the right of the converted patch as shown:. The question is why is this happening even though the coordinates and the dimensions of the patch have even values?
Interestingly enough for an odd width value, that issue is not present. The patch has the following bounding box: x:0, y:0, w, h The behavior should be reproducible with any image.
Conversion can be done using the ppm conversion page. The following code creates a nv12 image from a bgr24 image and then converts a nv12 patch back to bgr24 patch. If everything worked properly the output should have been identical to a source image.
4:2:0 Video Pixel Formats
Most of the used algorithms need to include neighboring pixels in the calculation of a target pixel. Of course, this could lead to problems at the edges if the image dimensions are not a multiple of x. Where x depends on the used algorithms. Using image dimensions x on the left, gives an artifact on the right edge of the converted patch.
If one chooses x it does not give an artifact, see the right image in the screenshot comparing both procedures. The luma values are determined for each pixel. The color information, which is divided into Cb and Cr, is calculated from a 2x2 pixel block. The minimum image size would therefore be a 2 x 2 image block resulting in 6 bytes i.
So theoretically for each output pixel the nearest input pixel is determined and used, which does not cause any alignment restrictions. But an important aspect of the actual implementations of algorithms, however, is often performance.
In this case, e. Also do not forget the possibility of hardware-accelerated operations. This doesn't give any artifacts either. If you have to convert a lot of frames, I would take a look at the performance.Hello, I have two question.
Could anyone tell me where the problem is. They only validated it on Win7. About H decoder output pin, I suggest to you use ffdshow raw filter if you want to do a simple test. It works I tested on XP and on seven. It's a good way to you understand how these filters works and test it running on your windows environment. Thanks for Petter's and Guilherme's help. But I use ffdshow raw filter on XP and the color of image after decoding is a little strange.
Skip to main content. Last post. RSS Top. Log in to post comments. Hi, 1. However, we plan to add support for this as part of the encoder filter in the next release of the Intel Media SDK 3. Of course, you can always develop your own algorithm to convert between color spaces, but using Intel Media SDK VPP will allow you to take advantage of HW acceleration if your platform supports it.
This may help in case you try to develop your own conversion routine.
Hi kkbee! Leave a Comment Please sign in to add a comment. Not a member? Join today. For more complete information about compiler optimizations, see our Optimization Notice. Rate Us.You may also leave feedback directly on GitHub. Skip to main content. Exit focus mode. The second line of each pair of output lines is generally either a duplicate of the first line or is produced by averaging the samples in the first line of the pair with the samples of the first line of the next pair.
YV12 All Y samples are found first in memory as an array of unsigned char possibly with a larger stride for memory alignmentfollowed immediately by all Cr samples with half the stride of the Y lines, and half the number of linesthen followed immediately by all Cb samples in a similar fashion. NV12 A format in which all Y samples are found first in memory as an array of unsigned char with an even number of lines possibly with a larger stride for memory alignment.
This is followed immediately by an array of unsigned char containing interleaved Cb and Cr samples. If these samples are addressed as a little-endian WORD type, Cb would be in the least significant bits and Cr would be in the most significant bits with the same total stride as the Y samples. NV12 is the preferred pixel format. NV21 The same as NV12, except that Cb and Cr samples are swapped so that the chroma array of unsigned char would have Cr followed by Cb for each sample such that if addressed as a little-endian WORD type, Cr would be in the least significant bits and Cb would be in the most significant bits.
Also, the Cb and Cr planes must fall on memory boundaries that are a multiple of 16 lines. The following code examples show calculations for the Cb and Cr planes.
In other words, each full-stride line in the chrominance area starts with a line of Cr, followed by a line of Cb that starts at the next half-stride boundary. This is a more address-space-efficient format than IMC1, because it cuts the chrominance address space in half, and thus cuts the total address space by 25 percent.
This is an optionally preferred format in relation to NV12, but NV12 appears to be more popular. Related Articles Is this page helpful? Yes No. Any additional feedback? Skip Submit. Send feedback about This product This page. This page. Submit feedback. There are no open issues. View on GitHub. Is this page helpful? As described in Video Pixel Formatsexcept that two lines of output Cb and Cr samples are produced for each actual line of Cb and Cr samples.
All Y samples are found first in memory as an array of unsigned char possibly with a larger stride for memory alignmentfollowed immediately by all Cr samples with half the stride of the Y lines, and half the number of linesthen followed immediately by all Cb samples in a similar fashion.Chroma Subsampling Tutorial - 4:4:4 / 4:2:2 Explained
A format in which all Y samples are found first in memory as an array of unsigned char with an even number of lines possibly with a larger stride for memory alignment. The same as NV12, except that Cb and Cr samples are swapped so that the chroma array of unsigned char would have Cr followed by Cb for each sample such that if addressed as a little-endian WORD type, Cr would be in the least significant bits and Cb would be in the most significant bits.
The same as YV12, except that the stride of the Cb and Cr planes is the same as the stride in the Y plane.