One of the challenges faced by many video providers is the heterogeneity of
network specifications, user requirements, and content compression performance.
The universal solution of a fixed bitrate ladder is inadequate in ensuring a
high quality of user experience without re-buffering or introducing annoying
compression artifacts. However, a content-tailored solution, based on
extensively encoding across all resolutions and over a wide quality range is
highly expensive in terms of computational, financial, and energy costs.
Inspired by this, we propose an approach that exploits machine learning to
predict a content-optimized bitrate ladder. The method extracts spatio-temporal
features from the uncompressed content, trains machine-learning models to
predict the Pareto front parameters, and, based on that, builds the ladder
within a defined bitrate range. The method has the benefit of significantly
reducing the number of encodes required per sequence. The presented results,
based on 100 HEVC-encoded sequences, demonstrate a reduction in the number of
encodes required when compared to an exhaustive search and an
interpolation-based method, by 89.06% and 61.46%, respectively, at the cost of
an average Bj{\o}ntegaard Delta Rate difference of 1.78% compared to the
exhaustive approach. Finally, a hybrid method is introduced that selects either
the proposed or the interpolation-based method depending on the sequence
features. This results in an overall 83.83% reduction of required encodings at
the cost of an average Bj{\o}ntegaard Delta Rate difference of 1.26%