14,047 research outputs found
Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding
Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 Ă 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
Deep neural networks have achieved impressive results in computer vision and
machine learning. Unfortunately, state-of-the-art networks are extremely
compute and memory intensive which makes them unsuitable for mW-devices such as
IoT end-nodes. Aggressive quantization of these networks dramatically reduces
the computation and memory footprint. Binary-weight neural networks (BWNs)
follow this trend, pushing weight quantization to the limit. Hardware
accelerators for BWNs presented up to now have focused on core efficiency,
disregarding I/O bandwidth and system-level efficiency that are crucial for
deployment of accelerators in ultra-low power devices. We present Hyperdrive: a
BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel
binary-weight streaming approach, which can be used for arbitrarily sized
convolutional neural network architecture and input resolution by exploiting
the natural scalability of the compute units both at chip-level and
system-level by arranging Hyperdrive chips systolically in a 2D mesh while
processing the entire feature map together in parallel. Hyperdrive achieves 4.3
TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than
state-of-the-art BWN accelerators, even if its core uses resource-intensive
FP16 arithmetic for increased robustness
YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration
Convolutional neural networks (CNNs) have revolutionized the world of
computer vision over the last few years, pushing image classification beyond
human accuracy. The computational effort of today's CNNs requires power-hungry
parallel processors or GP-GPUs. Recent developments in CNN accelerators for
system-on-chip integration have reduced energy consumption significantly.
Unfortunately, even these highly optimized devices are above the power envelope
imposed by mobile and deeply embedded applications and face hard limitations
caused by CNN weight I/O and storage. This prevents the adoption of CNNs in
future ultra-low power Internet of Things end-nodes for near-sensor analytics.
Recent algorithmic and theoretical advancements enable competitive
classification accuracy even when limiting CNNs to binary (+1/-1) weights
during training. These new findings bring major optimization opportunities in
the arithmetic core by removing the need for expensive multiplications, as well
as reducing I/O bandwidth and storage. In this work, we present an accelerator
optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core
area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm and with a power
dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator
significantly outperforms the state-of-the-art in terms of energy and area
efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively
Hardware acceleration architectures for MPEG-Based mobile video platforms: a brief overview
This paper presents a brief overview of past and current hardware acceleration (HwA) approaches that have been proposed for the most computationally intensive compression tools of the MPEG-4 standard. These approaches are classified based on their historical evolution and architectural approach. An analysis of both evolutionary and functional classifications is carried out in order to speculate on the possible trends of the HwA architectures to be employed in mobile video platforms
A review of High Performance Computing foundations for scientists
The increase of existing computational capabilities has made simulation
emerge as a third discipline of Science, lying midway between experimental and
purely theoretical branches [1, 2]. Simulation enables the evaluation of
quantities which otherwise would not be accessible, helps to improve
experiments and provides new insights on systems which are analysed [3-6].
Knowing the fundamentals of computation can be very useful for scientists, for
it can help them to improve the performance of their theoretical models and
simulations. This review includes some technical essentials that can be useful
to this end, and it is devised as a complement for researchers whose education
is focused on scientific issues and not on technological respects. In this
document we attempt to discuss the fundamentals of High Performance Computing
(HPC) [7] in a way which is easy to understand without much previous
background. We sketch the way standard computers and supercomputers work, as
well as discuss distributed computing and discuss essential aspects to take
into account when running scientific calculations in computers.Comment: 33 page
- âŠ