After reading John’s excellent post over here, I got to thinking about 1080p content (or “difficult” content, really) and how we might be able to better handle it. To begin, let’s go over the numbers:
- A typical uncompressed 1080p frame takes up 3 MB (megabytes). This is full resolution for the Y plane, and subsampled U and V planes.
- This means that a single second of uncompressed video (24p) takes up almost 73 MB.
That’s a huge amount of bandwidth, compared to the compressed video. An episode of Lost I’m watching right now is hovering around 8 Mbps, which is 1 MB/sec. This is a pretty impressive compression ratio, and there also a few possibly interesting takeaways from it:
- Streaming uncompressed 1080p content over Gigabit Ethernet is theoretically possible, but only just barely.
- Streaming uncompressed 1080p content off a typical hard drive is theoretically possible, but again, only just barely and only in “best case scenarios”.
Most of the problems we run into playing 1080p video content have to do with two the combination of two things:
- The existence of bursts of high bit-rate frame sequences (some only a second or two long, some 10s of seconds).
- The fact that we do real-time decoding, so if a frame is late, we drop it.
So here are some random ideas that were running through my head as I woke up this morning:
- Decode buffer: Disk buffers work to smooth out arrival jitter in data from the disk, so we don’t we have a decode buffer to smooth out arrival jitter in the decoded frames? That way, if we run into a frame sequence where we would get behind, we can simply empty frames from the decode buffer. Since the CPU is usually ahead, we could simply decode as fast as we can, and accumulate some number of pre-decoded frames (in RAM). To keep 2 seconds of decoded frames would cost about 150MB of RAM, which is chump change these days. And even if we pre-filled the buffer before playing, the pre-roll delay would only be increased by at *most* 2 seconds.
- Grid computing: Harness the power of your idle computers and have them decoding frames for you. The problem is the extreme bandwidth of the decoded frames, as mentioned. But perhaps apply some high quality MPEG-2 compression, and it becomes much more manageable. Presumably they could decode sequences between keyframes.
- Sidecar Preprocessing: Since MPEG-2 is easier to decompress, why not simply run a preprocessor over H.264 files? Any spots that are higher than a specified bitrate would be pre-decoded, encoded to high quality MPEG-2, and then stored in a “sidecar”, alongside the original media. Imagine “Planet.Earth.01.mkv” and then “Planet.Earth.01.mkv.sidecar.00:02:01-00:02:40.mpeg2” (assuming the sidecars start and end on second boundaries). The sidecar format could be anything appropriate, and it wouldn’t have to store audio or other metadata (e.g. subtitles). Best of all, they could be whacked at any point or recomputed.
EDIT: Of course, the sidecar media could also just be lower bit-rate H.264, that way ffmpeg wouldn’t need to be any the wiser.
Hope you’re all having a great weekend.