Search CORE

141 research outputs found

Complexity/Performance Analysis of a H.264/AVC Video Encoder

Author: Abderrazek Jemai
Ahmed Chiheb Ammari
Hajer Krichene Zrida
Mohamed Abid
Publication venue: 'IntechOpen'
Publication date: 05/07/2011
Field of study

IntechOpen

Implementation of fast HEVC encoder based on SIMD and data-level parallelism

Author: A Alshin
A Fuldseth
B Bross
B Jung
CC Chi
Dong-Gyu Sim
F Bossen
F Henry
G Bjontegaard
G Clare
GJ Sullivan
H Samet
I-K Kim
Intel
J Jung
J Yang
Joint Collaborative Team on Video Coding
K Chen
K Choi
K McCann
M Budagavi
N Zhang
R De Forni
RH Gweon
S Sankaraiah
T Wiegand
Tae-Jin Hwang
W-J Han
Woo-Jin Han
X Tian
Y Yuan
Yong-Jo Ahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Systematic analysis of the decoding delay in multiview video

Author: Cabrera Quesada Julian
Carballeira López Pablo
García Santos Narciso
Jaureguizar Núñez Fernando
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

We present a framework for the analysis of the decoding delay in multiview video coding (MVC). We show that in real-time applications, an accurate estimation of the decoding delay is essential to achieve a minimum communication latency. As opposed to single-view codecs, the complexity of the multiview prediction structure and the parallel decoding of several views requires a systematic analysis of this decoding delay, which we solve using graph theory and a model of the decoder hardware architecture. Our framework assumes a decoder implementation in general purpose multi-core processors with multi-threading capabilities. For this hardware model, we show that frame processing times depend on the computational load of the decoder and we provide an iterative algorithm to compute jointly frame processing times and decoding delay. Finally, we show that decoding delay analysis can be applied to design decoders with the objective of minimizing the communication latency of the MVC system

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Real-time data acquisition, transmission and archival framework

Author: Agarwal Piyush
Publication venue: RIT Scholar Works
Publication date: 01/02/2011
Field of study

Most human actions are a direct response to stimuli from their five senses. In the past few decades there has been a growing interest in capturing and storing the information that is obtained from the senses using analog and digital sensors. By storing this data it is possible to further analyze and better understand human perception. While many devices have been created for capturing and storing data, existing software and hardware architectures are aimed towards specialized devices and require expensive high-performance systems. This thesis aims to create a framework that supports capture and monitoring of a variety of sensors and can be scaled to run on low and high-performance systems such as netbooks, laptops and desktop systems. The proposed architecture was tested using aural and visual sensors due to their availability and higher bandwidth requirements compared to other sensors. Four different portable computing devices were used for testing with a varied set of hardware capabilities. On each of the systems the same suite of tests were run to benchmark and analyze CPU, memory, network, and storage usage statistics. From the results it was shown that on all of these platforms capturing data from multiple video, audio and other sensor sources was possible in real-time. Performance was shown to scale based on several factors, but the most important were CPU architecture, network topology and data interfaces used

RIT Scholar Works

Parallel scalability of video decoders

Author: Azevedo Arnaldo
Juurlink Ben
Meenderinck Cor
Ramírez Bellido Alejandro
Álvarez Mesa Mauricio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2009
Field of study

An important question is whether emerging and future applications exhibit sufficient parallelism, in particular thread-level parallelism, to exploit the large numbers of cores future chip multiprocessors (CMPs) are expected to contain. As a case study we investigate the parallelism available in video decoders, an important application domain now and in the future. Specifically, we analyze the parallel scalability of the H.264 decoding process. First we discuss the data structures and dependencies of H.264 and show what types of parallelism it allows to be exploited. We also show that previously proposed parallelization strategies such as slice-level, frame-level, and intra-frame macroblock (MB) level parallelism, are not sufficiently scalable. Based on the observation that inter-frame dependencies have a limited spatial range we propose a new parallelization strategy, called Dynamic 3D-Wave. It allows certain MBs of consecutive frames to be decoded in parallel. Using this new strategy we analyze the limits to the available MB-level parallelism in H.264. Using real movie sequences we find a maximum MB parallelism ranging from 4000 to 7000. We also perform a case study to assess the practical value and possibilities of a highly parallelized H.264 application. The results show that H.264 exhibits sufficient parallelism to efficiently exploit the capabilities of future manycore CMPs.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Video Coding Performance

Author: Shahriar Akramullah
Publication venue: Apress
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine

Author: Chi Chi Ching
Juurlink Ben
Meenderinck Cor
Publication venue
Publication date: 01/01/2010
Field of study

How to develop efficient and scalable parallel applications is the key challenge for emerging many-core architectures. We investigate this question by implementing and comparing two parallel H.264 decoders on the Cell architecture. It is expected that future many-cores will use a Cell-like local store memory hierarchy, rather than a non-scalable shared memory. The two implemented parallel algorithms, the Task Pool (TP) and the novel Ring-Line (RL) approach, both exploit macroblock-level parallelism. The TP implementation follows the master-slave paradigm and is very dynamic so that in theory perfect load balancing can be achieved. The RL approach is distributed and more predictable in the sense that the mapping of macroblocks to processing elements is fixed. This allows to better exploit data locality, to overlap communication with computation, and to reduce communication and synchronization overhead. While TP is more scalable in theory, the actual scalability favors RL. Using 16 SPEs, RL obtains a scalability of 12x, while TP achieves only 10.3x. More importantly, the absolute performance of RL is much higher. Using 16 SPEs, RL achieves a throughput of 139.6 frames per second (fps) while TP achieves only 76.6 fps. A large part of the additional performance advantage is due to hiding the memory latency. From the results we conclude that in order to fully leverage the performance of future many-cores, a centralized master should be avoided and the mapping of tasks to cores should be predictable in order to be able to hide the memory latency

DepositOnce

A highly scalable parallel implementation of H.264

Author: Azevedo Arnaldo
Hoogerbrugge Jan
Juurlink Ben
Meenderinck Cor
Ramírez Bellido Alejandro
Terechko Andrei
Valero Cortés Mateo
Álvarez Mesa Mauricio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Developing parallel applications that can harness and efficiently use future many-core architectures is the key challenge for scalable computing systems. We contribute to this challenge by presenting a parallel implementation of H.264 that scales to a large number of cores. The algorithm exploits the fact that independent macroblocks (MBs) can be processed in parallel, but whereas a previous approach exploits only intra-frame MB-level parallelism, our algorithm exploits intra-frame as well as inter-frame MB-level parallelism. It is based on the observation that inter-frame dependencies have a limited spatial range. The algorithm has been implemented on a many-core architecture consisting of NXP TriMedia TM3270 embedded processors. This required to develop a subscription mechanism, where MBs are subscribed to the kick-off lists associated with the reference MBs. Extensive simulation results show that the implementation scales very well, achieving a speedup of more than 54 on a 64-core processor, in which case the previous approach achieves a speedup of only 23. Potential drawbacks of the 3D-Wave strategy are that the memory requirements increase since there can be many frames in flight, and that the frame latency might increase. Scheduling policies to address these drawbacks are also presented. The results show that these policies combat memory and latency issues with a negligible effect on the performance scalability. Results analyzing the impact of the memory latency, L1 cache size, and the synchronization and thread management overhead are also presented. Finally, we present performance requirements for entropy (CABAC) decoding. This work was performed while the fourth author was with NXP Semiconductors.Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Fine-Grain Parallelism

Author: Archambault Erik L
Hogeboom John Forrest
Montgomery Joshua A
Pardy Christopher
Smith Christopher Eduard
Tate Brian
Publication venue: Digital WPI
Publication date: 12/03/2010
Field of study

Computer hardware is at the beginning of the multi-core revolution. While hardware at the commodity level is capable of running concurrent software, most software does not take advantage of this fact because parallel software development is difficult. This project addressed potential remedies to these difficulties by investigating graphical programming and fine-grain parallelism. A prototype system taking advantage of both of these concepts was implemented and evaluated in terms of real-world applications

DigitalCommons@WPI

Investigation of parallel programming on heterogeneous multiprocessors

Author: Espeland Håvard
Publication venue
Publication date: 01/01/2008
Field of study

Multi-core processors have become ordinary in modern commodity computers. Computationally intensive applications, like video processing, that previously only ran on specialized hardware, are now common on home computers. However, the demand for more computing power is ever-increasing, and with the introduction of high definition video, more performance is desired. As an alternative to having multiple identical processor cores, heterogeneous multiprocessors have cores with different capabilities. This allows tasks to be processed on simple cores with specialized functionality. The simplicity furthers low power consumption, small die usage, and low price. Dealing with heterogeneous cores increases the complexity of writing programs for the architecture. The reasons for this includes different capabilities of the cores, and some heterogeneous architectures do not have shared memory. Without shared memory, accessing main memory requires explicit transfers to local memory. In this thesis, we consider two architectures, the STI Cell/B.E. and Intel IXP2400, and evaluate parallelization strategies and performance for real-world problems. Our tests show promising throughput for some applications, and we propose a scheme for offloading computationally intensive parts of an existing application

NORA - Norwegian Open Research Archives