5 research outputs found
Partitioning Strategies for Concurrent Programming
This work presents four partitioning strategies, or patterns, useful for decomposing a serial application into multiple concurrently executing parts. These partitioning strategies augment the commonly used task and data parallel design patterns by recognizing that applications are spatiotemporal in nature. Therefore, data and instruction decomposition are further distinguished by whether the partitioning is done in the spatial or in temporal dimension. Thus, this work describes four decomposition strategies: spatial data partitioning (SDP), temporal data partitioning (TDP), spatial instruction partitioning (SIP), and temporal instruction partitioning (TIP), while cataloging the benefits and drawbacks of each. In addition, the practical use of these strategies is demonstrated through a case study in which they are applied to implement several different parallelizations of a multicore H.264 encoder for HD video. This case study illustrates both the application of the patterns and their effects on the performance of the encoder
Parallel scalability of video decoders
An important question is whether emerging
and future applications exhibit sufficient parallelism, in particular thread-level parallelism, to exploit the large numbers of cores future chip multiprocessors (CMPs)
are expected to contain. As a case study we investigate the parallelism available in video decoders, an important application domain now and in the future. Specifically,
we analyze the parallel scalability of the H.264 decoding process. First we discuss the data structures and dependencies of H.264 and show what types of parallelism it allows to be exploited. We also show that previously proposed parallelization strategies such as slice-level, frame-level, and intra-frame macroblock (MB) level parallelism, are not sufficiently scalable.
Based on the observation that inter-frame dependencies have a limited spatial range we propose a new parallelization strategy, called Dynamic 3D-Wave. It allows certain MBs of consecutive frames to be decoded
in parallel. Using this new strategy we analyze the limits to the available MB-level parallelism in H.264. Using real movie sequences we find a maximum MB parallelism ranging from 4000 to 7000. We also perform
a case study to assess the practical value and possibilities of a highly parallelized H.264 application. The results show that H.264 exhibits sufficient parallelism to efficiently exploit the capabilities of future manycore CMPs.Peer ReviewedPostprint (published version
A Framework for Parallelizing OWL Classification in Description Logic Reasoners
The Web Ontology Language (OWL) is a widely used knowledge representation language for describing knowledge in application domains by using classes, properties, and individuals. Ontology classification is an important and widely used service that computes a taxonomy of all classes occurring in an ontology. It can require significant amounts of runtime, but most OWL reasoners do not support any kind of parallel processing.
This thesis reports on a black-box approach to parallelize existing description logic (DL) reasoners for theWeb Ontology Language. We focus on OWL ontology classification, which is an important inference service and supported by every major OWL/DL reasoner. To the best of our knowledge, we are the first to propose a flexible parallel framework which can be applied to existing OWL reasoners in order to speed up their classification process. There are two versions of our methods discussed: (i) the first version implements a novel thread-level parallel architecture with two parallel strategies to achieve a good speedup factor with an increasing number of threads, but does not rely on locking techniques and thus avoids possible race conditions. (ii) The improved version implements an improved data structure and various parallel computing techniques for precomputing and classification to reduce the overhead of processing ontologies and compete with other DL reasoners based on the wall clock time for classification.
In order to test the performance of both versions of our approaches, we use a real-world repository for choosing the tested ontologies. For the first version of our approach, we evaluated our prototype implementation with a set of selected real-world ontologies. Our experiments demonstrate very good scalability resulting in a speedup that is linear to the number of available cores. For the second version, its performance is evaluated by parallelizing major OWL reasoners for concept classification. Currently, we mainly focus on comparison with two popular DL reasoners: Hermit and JFact. In comparison to the selected black-box reasoners, our results demonstrate that the wall clock time of ontology classification can be improved by one order of the magnitude for most real-world ontologies in the repository
Optimisation dâalgorithmes de codage vidĂ©o sur des plateformes Ă plusieurs processeurs parallĂšles
H.264 est le standard de codage vidĂ©o le plus rĂ©cent et le plus puissant. Ce standard permet, par rapport Ă ses prĂ©dĂ©cesseurs, dâaugmenter le taux de compression par un facteur dâau moins deux, mais au prix dâune complexitĂ© plus Ă©levĂ©e. Pour rĂ©duire le temps dâencodage, plusieurs encodeurs H.264 utilisent une approche parallĂšle. Dans le cadre de ce travail de recherche, notre objectif premier est de concevoir une approche offrant une meilleure accĂ©lĂ©ration que lâapproche implĂ©mentĂ©e dans lâencodeur H.264 dâIntel livrĂ© en code dâexemple dans sa librairie IPP.
Nous prĂ©sentons notre approche dâencodage vidĂ©o parallĂšle multi-trames et multi-tranches (MTMT) et ses modes dâestimation de mouvement qui offrent un compromis entre lâaccĂ©lĂ©ration et la perte de qualitĂ© visuelle. Le premier mode, le plus rapide, mais dĂ©gradant le plus la qualitĂ©, restreint la rĂ©gion de recherche de l'estimation de mouvement Ă l'intĂ©rieur des limites de la tranche courante. Le second mode, moins rapide, mais dĂ©gradant moins la qualitĂ© que le premier, Ă©largit la rĂ©gion de recherche aux tranches voisines, quand les tranches de rĂ©fĂ©rence y correspondant ont Ă©tĂ© traitĂ©es. Le troisiĂšme mode, moins rapide que le second, mais dĂ©gradant moins la qualitĂ©, rend une tranche prĂȘte Ă l'encodage seulement quand les tranches de rĂ©fĂ©rence couvrant la rĂ©gion de recherche ont Ă©tĂ© traitĂ©es.
Nos expĂ©riences montrent que le premier mode de notre approche offre une accĂ©lĂ©ration moyenne environ 55 % plus Ă©levĂ©e que celle obtenue par lâapproche dâIntel. Nos expĂ©riences montrent aussi que nous obtenons une accĂ©lĂ©ration comparable Ă celle obtenue par lâĂ©tat de lâart sans lâinconvĂ©nient de forcer lâutilisation des trames B. De plus, notre approche sâimplĂ©mente rapidement dans un encodeur H.264 qui, comme lâencodeur H.264 dâIntel, est basĂ© sur une approche multi-tranches