The method used in this approach was inspired by an article about Real-Time YCoCg-DXT Compression which presented a real-time GPU compression algorithm to DXT formats.
Standard DXT texture formats aren't very suitable for compression of general images like the game frames, the higher contrast results in artifacts like color bleeding and color blocking. The article introduced YCoCg-DXT format that encodes colors to YCoCg color space (intensity and orange and green chrominance). It also contains the source code for real-time GPU compression and comparison of achieved results.
The YCoCg format is suitable for decompression on GPU, because decoding YCoCg values back to RGB only takes a few shader instructions. However, for the purpose of decoding the frame data in a video codec, a better format is a YUV-based one that allows to decode the data directly to the video surface without additional conversions. The best format for this seemed to be YUYV with 16 bits per sample, which means there's one U and V value per 2 horizontal samples.
The compression algorithm differs from the YCoCg-DXT one in the initial color space conversion to YUYV and in that it encodes 4x4 YY, U and V blocks in the way alpha component is encoded in DXT5 format.
The algorithm is as follows:
- Video frames are compressed with fragment shader to YUYV-DXT format by render to texture technique, reducing the data to 1/3 of its original size
- The compressed textures are asynchronously read back to CPU
- The data are continuously written to disk
The compression on GPU reduces the bandwidth needed between CPU and GPU, but more importantly also the bandwidth needed for disk writes. Sustainable write speed of a SATA drives is somewhere around 55MB/s, transferring a raw 1280x720/30fps video takes 79.1MB/s, while the DXT compressed video only takes 26.4MB/s. A Full-HD video stream is 59.3MB/s.
To capture the frame buffer data the application first renders to an intermediate target. The compression shader uses this as the input texture, rendering to a uint4 target with one quarter width and height of the original resolution, that is then read back to CPU memory.
The next step is decoding the captured video. To make this easy I've written a custom video codec and video format plugin for ffmpeg library. The format was named Yog (from YCoCg) as the encoding was originally in YCoCg format, changed only later to YUYV.
The game produces *.yog video files that can be directly replayed by ffplay or converted to another video format with the ffmpeg utility. They are also recognized by any video processing software that uses ffmpeg or ffplay executables or uses the avcodec and avformat dlls from the suite, such as WinFF or FFe or many others.
After starting the video recording in our game the frame rate drops only by a few fps, and it's still playable normally, unlike when recording for example with Fraps. Disadvantage is that this has to be integrated into the renderer path.
Quality wise the results are quite good, as it can be seen on the following screen shots:
YUYV compressed, note this is slightly lighter because of an issue in ffmpeg that has to be solved yet.
The difference, 4X amplified
The source code and further implementation details can be found at outerra.com/video/index.html