49 research outputs found
The impact of Tiles on video coding performance: a case study on HEVC and AV1 video coding standards
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction
In VP9 video codec, the sizes of blocks are decided during encoding by
recursively partitioning 6464 superblocks using rate-distortion
optimization (RDO). This process is computationally intensive because of the
combinatorial search space of possible partitions of a superblock. Here, we
propose a deep learning based alternative framework to predict the intra-mode
superblock partitions in the form of a four-level partition tree, using a
hierarchical fully convolutional network (H-FCN). We created a large database
of VP9 superblocks and the corresponding partitions to train an H-FCN model,
which was subsequently integrated with the VP9 encoder to reduce the intra-mode
encoding time. The experimental results establish that our approach speeds up
intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in
the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in
speed levels which are designed to provide faster encoding at the expense of
decreased rate-distortion performance, we find that our model is able to
outperform the fastest recommended speed level of the reference VP9 encoder for
the good quality intra encoding configuration, in terms of both speedup and
BD-rate
Potential for added value in precipitation simulated by high-resolution nested Regional Climate Models and observations
Regional Climate Models (RCMs) constitute the most often used method to perform affordable high-resolution regional climate simulations. The key issue in the evaluation of nested regional models is to determine whether RCM simulations improve the representation of climatic statistics compared to the driving data, that is, whether RCMs add value. In this study we examine a necessary condition that some climate statistics derived from the precipitation field must satisfy in order that the RCM technique can generate some added value: we focus on whether the climate statistics of interest contain some fine spatial-scale variability that would be absent on a coarser grid. The presence and magnitude of fine-scale precipitation variance required to adequately describe a given climate statistics will then be used to quantify the potential added value (PAV) of RCMs. Our results show that the PAV of RCMs is much higher for short temporal scales (e.g., 3-hourly data) than for long temporal scales (16-day average data) due to the filtering resulting from the time-averaging process. PAV is higher in warm season compared to cold season due to the higher proportion of precipitation falling from small-scale weather systems in the warm season. In regions of complex topography, the orographic forcing induces an extra component of PAV, no matter the season or the temporal scale considered. The PAV is also estimated using high-resolution datasets based on observations allowing the evaluation of the sensitivity of changing resolution in the real climate system. The results show that RCMs tend to reproduce relatively well the PAV compared to observations although showing an overestimation of the PAV in warm season and mountainous regions
Recommended from our members
Design Space Exploration of Accelerators for Warehouse Scale Computing
With Moore’s law grinding to a halt, accelerators are one of the ways that new silicon can improve performance, and they are already a key component in modern datacenters. Accelerators are integrated circuits that implement parts of an application with the objective of higher energy efficiency compared to execution on a standard general purpose CPU. Many accelerators can target any particular workload, generally with a wide range of performance, and costs such as area or power. Exploring these design choices, called Design Space Exploration (DSE), is a crucial step in trying to find the most efficient accelerator design, the one that produces the largest reduction of the total cost of ownership.
This work aims to improve this design space exploration phase for accelerators and to avoid pitfalls in the process. This dissertation supports the thesis that early design choices – including the level of specialization – are critical for accelerator development and therefore require benchmarks reflective of production workloads. We present three studies that support this thesis. First, we show how to benchmark datacenter applications by creating a benchmark for large video sharing infrastructures. Then, we present two studies focused on accelerators for analytical query processing. The first is an analysis on the impact of Network on Chip specialization while the second analyses the impact of the level of specialization.
The first part of this dissertation introduces vbench: a video transcoding benchmark tailored to the growing video-as-a-service market. Video transcoding is not accurately represented in current computer architecture benchmarks such as SPEC or PARSEC. Despite posing a big computational burden for cloud video providers, such as YouTube and Facebook, it is not included in cloud benchmarks such as CloudSuite. Using vbench, we found that the microarchitectural profile of video transcoding is highly dependent on the input video, that SIMD extensions provide limited benefits, and that commercial hardware transcoders impose tradeoffs that are not ideal for cloud video providers. Our benchmark should spur architectural innovations for this critical workload. This work shows how to benchmark a real world warehouse scale application and the possible pitfalls in case of a mischaracterization.
When considering accelerators for the different, but no less important, application of analytical query processing, design space exploration plays a critical role. We analyzed the Q100, a class of accelerators for this application domain, using TPC-H as the reference benchmark. We found that the hardware computational blocks have to be tailored to the requirements of the application, but also the Network on Chip (NoC) can be specialized. We developed an algorithm capable of producing more effective Q100 designs by tailoring the NoC to the communication requirements of the system. Our algorithm is capable of producing designs that are Pareto optimal compared to standard NoC topologies. This shows how NoC specialization is highly effective for accelerators and it should be an integral part of design space exploration for large accelerators’ designs.
The third part of this dissertation analyzes the impact of the level of specialization, e.g. using an ASIC or Coarse Grain Reconfigurable Architecture (CGRA) implementation, on an accelerator performance. We developed a CGRA architecture capable of executing SQL query plans. We compare this architecture against Q100, an ASIC that targets the same class of workloads. Despite being less specialized, this programmable architecture shows comparable performance to the Q100 given an area and power budget. Resource usage explains this counterintuitive result, since a well programmed, homogeneous array of resources is able to more effectively harness silicon for the workload at hand. This suggests that a balanced accelerator research portfolio must include alternative programmable architectures – and their software stacks