Search CORE

296 research outputs found

A highly scalable parallel implementation of H.264

Author: Azevedo Arnaldo
Hoogerbrugge Jan
Juurlink Ben
Meenderinck Cor
Ramírez Bellido Alejandro
Terechko Andrei
Valero Cortés Mateo
Álvarez Mesa Mauricio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Developing parallel applications that can harness and efficiently use future many-core architectures is the key challenge for scalable computing systems. We contribute to this challenge by presenting a parallel implementation of H.264 that scales to a large number of cores. The algorithm exploits the fact that independent macroblocks (MBs) can be processed in parallel, but whereas a previous approach exploits only intra-frame MB-level parallelism, our algorithm exploits intra-frame as well as inter-frame MB-level parallelism. It is based on the observation that inter-frame dependencies have a limited spatial range. The algorithm has been implemented on a many-core architecture consisting of NXP TriMedia TM3270 embedded processors. This required to develop a subscription mechanism, where MBs are subscribed to the kick-off lists associated with the reference MBs. Extensive simulation results show that the implementation scales very well, achieving a speedup of more than 54 on a 64-core processor, in which case the previous approach achieves a speedup of only 23. Potential drawbacks of the 3D-Wave strategy are that the memory requirements increase since there can be many frames in flight, and that the frame latency might increase. Scheduling policies to address these drawbacks are also presented. The results show that these policies combat memory and latency issues with a negligible effect on the performance scalability. Results analyzing the impact of the memory latency, L1 cache size, and the synchronization and thread management overhead are also presented. Finally, we present performance requirements for entropy (CABAC) decoding. This work was performed while the fourth author was with NXP Semiconductors.Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Parallel scalability of video decoders

Author: Azevedo Arnaldo
Juurlink Ben
Meenderinck Cor
Ramírez Bellido Alejandro
Álvarez Mesa Mauricio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2009
Field of study

An important question is whether emerging and future applications exhibit sufficient parallelism, in particular thread-level parallelism, to exploit the large numbers of cores future chip multiprocessors (CMPs) are expected to contain. As a case study we investigate the parallelism available in video decoders, an important application domain now and in the future. Specifically, we analyze the parallel scalability of the H.264 decoding process. First we discuss the data structures and dependencies of H.264 and show what types of parallelism it allows to be exploited. We also show that previously proposed parallelization strategies such as slice-level, frame-level, and intra-frame macroblock (MB) level parallelism, are not sufficiently scalable. Based on the observation that inter-frame dependencies have a limited spatial range we propose a new parallelization strategy, called Dynamic 3D-Wave. It allows certain MBs of consecutive frames to be decoded in parallel. Using this new strategy we analyze the limits to the available MB-level parallelism in H.264. Using real movie sequences we find a maximum MB parallelism ranging from 4000 to 7000. We also perform a case study to assess the practical value and possibilities of a highly parallelized H.264 application. The results show that H.264 exhibits sufficient parallelism to efficiently exploit the capabilities of future manycore CMPs.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Video Coding Performance

Author: Shahriar Akramullah
Publication venue: Apress
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Performance study of synthetic AER generation on CPUs for Real-Time Video based on Spikes

Author: Domínguez Morales Manuel Jesús
Iñigo Blasco P.
Jiménez Moreno Gabriel
Linares Barranco Alejandro
Publication venue: 'Test accounts'
Publication date: 01/07/2009
Field of study

Address-Event-Representation (AER) is a neuromorphic interchip communication protocol that allows for real-time virtual massive connectivity between huge number neurons located on different chips. When building multi-chip muti-layered AER systems it is absolutely necessary to have a computer interface that allows (a) to read AER interchip traffic into the computer and visualize it on screen, and (b) convert conventional frame-based video stream in the computer into AER and inject it at some point of the AER structure. This is necessary for test and debugging of complex AER systems. Previous work presented several software methods for converting digital frames into AER format. Those methods were not feasible for real-time conversion those days because the processor performance was insufficient. Nowadays, Multi-core processor architectures and cache hierarchies have evolved and the performance is much better than Pentium 4 Mobile of those years. In this paper we study frame-to-AER methods for realtime video applications (40ms per frame) using modern processor architectures, compilers, and processors oriented for stand-alone applications (mini-PC processors

idUS. Depósito de Investigación Universidad de Sevilla

Video Encoder Implementation on Tilera\u27s TILEPro64TM Multicore Processor

Author: Arambasic Igor
Casajús-Quirós Javier
Parera-Bermúdez José
Publication venue: 'IntechOpen'
Publication date: 16/01/2013
Field of study

IntechOpen

Crossref

Fine-Grain Parallelism

Author: Archambault Erik L
Hogeboom John Forrest
Montgomery Joshua A
Pardy Christopher
Smith Christopher Eduard
Tate Brian
Publication venue: Digital WPI
Publication date: 12/03/2010
Field of study

Computer hardware is at the beginning of the multi-core revolution. While hardware at the commodity level is capable of running concurrent software, most software does not take advantage of this fact because parallel software development is difficult. This project addressed potential remedies to these difficulties by investigating graphical programming and fine-grain parallelism. A prototype system taking advantage of both of these concepts was implemented and evaluated in terms of real-world applications

DigitalCommons@WPI

Media gateway utilizando um GPU

Author: Portugal Ricardo
Publication venue: Universidade de Aveiro
Publication date: 01/01/2012
Field of study

Mestrado em Engenharia de Computadores e Telemátic

Repositório Institucional da Universidade de Aveiro

Investigation of parallel programming on heterogeneous multiprocessors

Author: Espeland Håvard
Publication venue
Publication date: 01/01/2008
Field of study

Multi-core processors have become ordinary in modern commodity computers. Computationally intensive applications, like video processing, that previously only ran on specialized hardware, are now common on home computers. However, the demand for more computing power is ever-increasing, and with the introduction of high definition video, more performance is desired. As an alternative to having multiple identical processor cores, heterogeneous multiprocessors have cores with different capabilities. This allows tasks to be processed on simple cores with specialized functionality. The simplicity furthers low power consumption, small die usage, and low price. Dealing with heterogeneous cores increases the complexity of writing programs for the architecture. The reasons for this includes different capabilities of the cores, and some heterogeneous architectures do not have shared memory. Without shared memory, accessing main memory requires explicit transfers to local memory. In this thesis, we consider two architectures, the STI Cell/B.E. and Intel IXP2400, and evaluate parallelization strategies and performance for real-world problems. Our tests show promising throughput for some applications, and we propose a scheme for offloading computationally intensive parts of an existing application

NORA - Norwegian Open Research Archives

H.264/AVC inter prediction on accelerator-based multi-core systems

Author: Claver José M.
Fernández Escribano Gerardo
Martínez José Luis
Rodríguez Sánchez Rafael
Sánchez José L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The AVC video coding standard adopts variable block sizes for inter frame coding to increase compression efficiency, among other new features. As a consequence of this, an AVC encoder has to employ a complex mode decision technique that requires high computational complexity. Several techniques aimed at accelerating the inter prediction process have been proposed in the literature in recent years. Recently, with the emergence of many-core processors or accelerators, a new way of supporting inter frame prediction has presented itself. In this paper, we present a step forward in the implementation of an AVC inter prediction algorithm in a graphics processing unit, using Compute Unified Device Architecture. The results show a negligible drop in rate distortion with a time reduction, on average, of over 98.8 % compared with full search and fast full search, and of over 80 % compared with UMHexagonS search

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Processamento de imagens médicas usando GPU

Author: Fonseca Francisco Xavier dos Santos
Publication venue: Universidade de Aveiro
Publication date: 01/01/2011
Field of study

Mestrado em Engenharia de Computadores e TelemáticaA aplicação CapView utiliza um algoritmo de classificação baseado em SVM (Support Vector Machines) para automatizar a segmentação topográfica de vídeos do trato intestinal obtidos por cápsula endoscópica. Este trabalho explora a aplicação de processadores gráficos (GPU) para execução paralela desse algoritmo. Após uma etapa de otimização da versão sequencial, comparou-se o desempenho obtido por duas abordagens: (1) desenvolvimento apenas do código do lado do host, com suporte em bibliotecas especializadas para a GPU, e (2) desenvolvimento de todo o código, incluindo o que é executado no GPU. Ambas permitiram ganhos (speedups) significativos, entre 1,4 e 7 em testes efetuados com GPUs individuais de vários modelos. Usando um cluster de 4 GPU do modelo de maior capacidade, conseguiu-se, em todos os casos testados, ganhos entre 26,2 e 27,2 em relação à versão sequencial otimizada. Os métodos desenvolvidos foram integrados na aplicação CapView, utilizada em rotina em ambientes hospitalares.The CapView application uses a classification algorithm based on SVMs (Support Vector Machines) for automatic topographic segmentation of gastrointestinal tract videos obtained through capsule endoscopy. This work explores the use graphic processors (GPUs) to parallelize the segmentation algorithm. After an optimization phase of the sequential version, two new approaches were analyzed: (1) development of the host code only, with support of specialized libraries for the GPU, and (2) development of the host and the device’s code. The two approaches caused substantial gains, with speedups between 1.4 and 7 times in tests made with several different individual GPUs. In a cluster of 4 GPUs of the most capable model, speedups between 26.2 and 27.2 times were achieved, compared to the optimized sequential version. The methods developed were integrated in the CapView application, used in routine in medical environments

Repositório Institucional da Universidade de Aveiro