6,188 research outputs found
Hardware acceleration architectures for MPEG-Based mobile video platforms: a brief overview
This paper presents a brief overview of past and current hardware acceleration (HwA) approaches that have been proposed for the most computationally intensive compression tools of the MPEG-4 standard. These approaches are classified based on their historical evolution and architectural approach. An analysis of both evolutionary and functional classifications is carried out in order to speculate on the possible trends of the HwA architectures to be employed in mobile video platforms
CABAC accelerator architectures for video compression in future multimedida : a survey
The demands for high quality, real-time performance and multi-format video support in consumer multimedia products are ever increasing. In particular, the future multimedia systems require efficient video coding algorithms and corresponding adaptive high-performance computational platforms. The H.264/AVC video coding algorithms provide high enough compression efficiency to be utilized in these systems, and multimedia processors are able to provide the required adaptability, but the algorithms complexity demands for more efficient computing platforms. Heterogeneous (re-)configurable systems composed of multimedia processors and hardware accelerators constitute the main part of such platforms. In this paper, we survey the hardware accelerator architectures for Context-based Adaptive Binary Arithmetic Coding (CABAC) of Main and High profiles of H.264/AVC. The purpose of the survey is to deliver a critical insight in the proposed solutions, and this way facilitate further research on accelerator architectures, architecture development methods and supporting EDA tools. The architectures are analyzed, classified and compared based on the core hardware acceleration concepts, algorithmic characteristics, video resolution support and performance parameters, and some promising design directions are discussed. The comparative analysis shows that the parallel pipeline accelerator architecture seems to be the most promising
Video Processing Acceleration using Reconfigurable Logic and Graphics Processors
A vexing question is `which architecture will prevail as the core feature of the next state of
the art video processing system?' This thesis examines the substitutive and collaborative
use of the two alternatives of the reconfigurable logic and graphics processor architectures.
A structured approach to executing architecture comparison is presented - this includes a
proposed `Three Axes of Algorithm Characterisation' scheme and a formulation of perfor-
mance drivers. The approach is an appealing platform for clearly defining the problem,
assumptions and results of a comparison. In this work it is used to resolve the advanta-
geous factors of the graphics processor and reconfigurable logic for video processing, and
the conditions determining which one is superior. The comparison results prompt the
exploration of the customisable options for the graphics processor architecture. To clearly
define the architectural design space, the graphics processor is first identifed as part of
a wider scope of homogeneous multi-processing element (HoMPE) architectures. A novel
exploration tool is described which is suited to the investigation of the customisable op-
tions of HoMPE architectures. The tool adopts a systematic exploration approach and a
high-level parameterisable system model, and is used to explore pre- and post-fabrication
customisable options for the graphics processor. A positive result of the exploration is the
proposal of a reconfigurable engine for data access (REDA) to optimise graphics processor
performance for video processing-specific memory access patterns. REDA demonstrates
the viability of the use of reconfigurable logic as collaborative `glue logic' in the graphics
processor architecture
Domain-Specific Computing Architectures and Paradigms
We live in an exciting era where artificial intelligence (AI) is fundamentally shifting the dynamics of industries and businesses around the world. AI algorithms such as deep learning (DL) have drastically advanced the state-of-the-art cognition and learning capabilities. However, the power of modern AI algorithms can only be enabled if the underlying domain-specific computing hardware can deliver orders of magnitude more performance and energy efficiency. This work focuses on this goal and explores three parts of the domain-specific computing acceleration problem; encapsulating specialized hardware and software architectures and paradigms that support the ever-growing processing demand of modern AI applications from the edge to the cloud.
This first part of this work investigates the optimizations of a sparse spatio-temporal (ST) cognitive system-on-a-chip (SoC). This design extracts ST features from videos and leverages sparse inference and kernel compression to efficiently perform action classification and motion tracking.
The second part of this work explores the significance of dataflows and reduction mechanisms for sparse deep neural network (DNN) acceleration. This design features a dynamic, look-ahead index matching unit in hardware to efficiently discover fine-grained parallelism, achieving high energy efficiency and low control complexity for a wide variety of DNN layers.
Lastly, this work expands the scope to real-time machine learning (RTML) acceleration. A new high-level architecture modeling framework is proposed. Specifically, this framework consists of a set of high-performance RTML-specific architecture design templates, and a Python-based high-level modeling and compiler tool chain for efficient cross-stack architecture design and exploration.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162870/1/lchingen_1.pd
- …