1,633 research outputs found
3D high definition video coding on a GPU-based heterogeneous system
H.264/MVC is a standard for supporting the sensation of 3D, based on coding from 2 (stereo) to N views. H.264/MVC adopts many coding options inherited from single view H.264/AVC, and thus its complexity is even higher, mainly because the number of processing views is higher. In this manuscript, we aim at an efficient parallelization of the most computationally intensive video encoding module for stereo sequences. In particular, inter prediction and its collaborative execution on a heterogeneous platform. The proposal is based on an efficient dynamic load balancing algorithm and on breaking encoding dependencies. Experimental results demonstrate the proposed algorithm's ability to reduce the encoding time for different stereo high definition sequences. Speed-up values of up to 90× were obtained when compared with the reference encoder on the same platform. Moreover, the proposed algorithm also provides a more energy-efficient approach and hence requires less energy than the sequential reference algorith
Performance evaluation of HEVC RCL applications mapped onto NoC-based embedded platforms
Today, several applications running into embedded systems have to fulfill soft or hard timing constraints. Video applications, like the modern High Efficiency Video Coding (HEVC), e.g., most often have soft real-time constraints. However, in specific scenarios, such as in robotic surgeries, the coupling of satellites and so on, harder timing constraints arise, becoming a huge challenge. Although the implementation of such applications in Networks-on-Chip (NoCs) being an alternative to reduce their algorithmic complexity and meet real-time constraints, a performance evaluation of the mapped NoC and the schedulability analysis for a given application are mandatory. In this work we make a performance evaluation of HEVC Residual Coding Loop (RCL) mapped onto a NoC-based embedded platform, considering the encoding of a single 1920x1080 pixels frame. A set of analysis exploring the combination of different NoC sizes and task mapping strategies were performed, showing for the typical and upper-bound workload cases scenarios when the application is schedulable and meets the real-time constraints
A Power-Efficient Methodology for Mapping Applications on Multi-Processor System-on-Chip Architectures
This work introduces an application mapping methodology and case study for multi-processor on-chip architectures. Starting from the description of an application in standard sequential code (e.g. in C), first the application is profiled, parallelized when possible, then its components are moved to hardware implementation when necessary to satisfy performance and power constraints. After mapping, with the use of hardware objects to handle concurrency, the application power consumption can be further optimized by a task-based scheduler for the
remaining software part, without the need for operating system support. The key contributions of this work are: a methodology for high-level hardware/software partitioning that allows the designer to use the same code for both hardware and
software models for simulation, providing nevertheless preliminary estimations for timing and power consumption; and a task-based scheduling algorithm that does not require operating system support. The methodology has been applied to
the co-exploration of an industrial case study: an MPEG4 VGA real-time encoder
The GPU on the simulation of cellular computing models
Membrane Computing is a discipline aiming to
abstract formal computing models, called membrane systems
or P systems, from the structure and functioning of the living
cells as well as from the cooperation of cells in tissues,
organs, and other higher order structures. This framework
provides polynomial time solutions to NP-complete problems
by trading space for time, and whose efficient simulation
poses challenges in three different aspects: an intrinsic
massively parallelism of P systems, an exponential computational
workspace, and a non-intensive floating point nature.
In this paper, we analyze the simulation of a family of recognizer
P systems with active membranes that solves the
Satisfiability problem in linear time on different instances of
Graphics Processing Units (GPUs). For an efficient handling
of the exponential workspace created by the P systems
computation, we enable different data policies to increase
memory bandwidth and exploit data locality through tiling
and dynamic queues. Parallelism inherent to the target P
system is also managed to demonstrate that GPUs offer a
valid alternative for high-performance computing at a considerably
lower cost. Furthermore, scalability is demonstrated
on the way to the largest problem size we were able to
run, and considering the new hardware generation from
Nvidia, Fermi, for a total speed-up exceeding four orders of
magnitude when running our simulations on the Tesla S2050
server.Agencia Regional de Ciencia y Tecnología - Murcia 00001/CS/2007Ministerio de Ciencia e Innovación TIN2009–13192Ministerio de Ciencia e Innovación TIN2009-14475-C04European Commission Consolider Ingenio-2010 CSD2006-0004
- …