5 research outputs found

    Partitioning Strategies for Concurrent Programming

    Get PDF
    This work presents four partitioning strategies, or patterns, useful for decomposing a serial application into multiple concurrently executing parts. These partitioning strategies augment the commonly used task and data parallel design patterns by recognizing that applications are spatiotemporal in nature. Therefore, data and instruction decomposition are further distinguished by whether the partitioning is done in the spatial or in temporal dimension. Thus, this work describes four decomposition strategies: spatial data partitioning (SDP), temporal data partitioning (TDP), spatial instruction partitioning (SIP), and temporal instruction partitioning (TIP), while cataloging the benefits and drawbacks of each. In addition, the practical use of these strategies is demonstrated through a case study in which they are applied to implement several different parallelizations of a multicore H.264 encoder for HD video. This case study illustrates both the application of the patterns and their effects on the performance of the encoder

    Parallel scalability of video decoders

    No full text
    An important question is whether emerging and future applications exhibit sufficient parallelism, in particular thread-level parallelism, to exploit the large numbers of cores future chip multiprocessors (CMPs) are expected to contain. As a case study we investigate the parallelism available in video decoders, an important application domain now and in the future. Specifically, we analyze the parallel scalability of the H.264 decoding process. First we discuss the data structures and dependencies of H.264 and show what types of parallelism it allows to be exploited. We also show that previously proposed parallelization strategies such as slice-level, frame-level, and intra-frame macroblock (MB) level parallelism, are not sufficiently scalable. Based on the observation that inter-frame dependencies have a limited spatial range we propose a new parallelization strategy, called Dynamic 3D-Wave. It allows certain MBs of consecutive frames to be decoded in parallel. Using this new strategy we analyze the limits to the available MB-level parallelism in H.264. Using real movie sequences we find a maximum MB parallelism ranging from 4000 to 7000. We also perform a case study to assess the practical value and possibilities of a highly parallelized H.264 application. The results show that H.264 exhibits sufficient parallelism to efficiently exploit the capabilities of future manycore CMPs.Peer ReviewedPostprint (published version

    A Framework for Parallelizing OWL Classification in Description Logic Reasoners

    Get PDF
    The Web Ontology Language (OWL) is a widely used knowledge representation language for describing knowledge in application domains by using classes, properties, and individuals. Ontology classification is an important and widely used service that computes a taxonomy of all classes occurring in an ontology. It can require significant amounts of runtime, but most OWL reasoners do not support any kind of parallel processing. This thesis reports on a black-box approach to parallelize existing description logic (DL) reasoners for theWeb Ontology Language. We focus on OWL ontology classification, which is an important inference service and supported by every major OWL/DL reasoner. To the best of our knowledge, we are the first to propose a flexible parallel framework which can be applied to existing OWL reasoners in order to speed up their classification process. There are two versions of our methods discussed: (i) the first version implements a novel thread-level parallel architecture with two parallel strategies to achieve a good speedup factor with an increasing number of threads, but does not rely on locking techniques and thus avoids possible race conditions. (ii) The improved version implements an improved data structure and various parallel computing techniques for precomputing and classification to reduce the overhead of processing ontologies and compete with other DL reasoners based on the wall clock time for classification. In order to test the performance of both versions of our approaches, we use a real-world repository for choosing the tested ontologies. For the first version of our approach, we evaluated our prototype implementation with a set of selected real-world ontologies. Our experiments demonstrate very good scalability resulting in a speedup that is linear to the number of available cores. For the second version, its performance is evaluated by parallelizing major OWL reasoners for concept classification. Currently, we mainly focus on comparison with two popular DL reasoners: Hermit and JFact. In comparison to the selected black-box reasoners, our results demonstrate that the wall clock time of ontology classification can be improved by one order of the magnitude for most real-world ontologies in the repository

    Optimisation d’algorithmes de codage vidĂ©o sur des plateformes Ă  plusieurs processeurs parallĂšles

    Get PDF
    H.264 est le standard de codage vidĂ©o le plus rĂ©cent et le plus puissant. Ce standard permet, par rapport Ă  ses prĂ©dĂ©cesseurs, d’augmenter le taux de compression par un facteur d’au moins deux, mais au prix d’une complexitĂ© plus Ă©levĂ©e. Pour rĂ©duire le temps d’encodage, plusieurs encodeurs H.264 utilisent une approche parallĂšle. Dans le cadre de ce travail de recherche, notre objectif premier est de concevoir une approche offrant une meilleure accĂ©lĂ©ration que l’approche implĂ©mentĂ©e dans l’encodeur H.264 d’Intel livrĂ© en code d’exemple dans sa librairie IPP. Nous prĂ©sentons notre approche d’encodage vidĂ©o parallĂšle multi-trames et multi-tranches (MTMT) et ses modes d’estimation de mouvement qui offrent un compromis entre l’accĂ©lĂ©ration et la perte de qualitĂ© visuelle. Le premier mode, le plus rapide, mais dĂ©gradant le plus la qualitĂ©, restreint la rĂ©gion de recherche de l'estimation de mouvement Ă  l'intĂ©rieur des limites de la tranche courante. Le second mode, moins rapide, mais dĂ©gradant moins la qualitĂ© que le premier, Ă©largit la rĂ©gion de recherche aux tranches voisines, quand les tranches de rĂ©fĂ©rence y correspondant ont Ă©tĂ© traitĂ©es. Le troisiĂšme mode, moins rapide que le second, mais dĂ©gradant moins la qualitĂ©, rend une tranche prĂȘte Ă  l'encodage seulement quand les tranches de rĂ©fĂ©rence couvrant la rĂ©gion de recherche ont Ă©tĂ© traitĂ©es. Nos expĂ©riences montrent que le premier mode de notre approche offre une accĂ©lĂ©ration moyenne environ 55 % plus Ă©levĂ©e que celle obtenue par l’approche d’Intel. Nos expĂ©riences montrent aussi que nous obtenons une accĂ©lĂ©ration comparable Ă  celle obtenue par l’état de l’art sans l’inconvĂ©nient de forcer l’utilisation des trames B. De plus, notre approche s’implĂ©mente rapidement dans un encodeur H.264 qui, comme l’encodeur H.264 d’Intel, est basĂ© sur une approche multi-tranches
    corecore