11 research outputs found

    Polyvalent Parallelizations for Hierarchical Block Matching Motion Estimation

    Get PDF
    Block matching motion estimation algorithms are widely used in video coding schemes. In this paper,we design an efficient hierarchical block matching motion estimation (HBMME) algorithm on a hypercube multiprocessor. Unlike systolic array designs, this solution is not tied down to specific values of algorithm parameters and thus offers increased flexibility. Moreover, the hypercube network can efficiently handle the non regular data flow of the HBMME algorithm. Our techniques nearly eliminate the occurrence of “difficult” communication patterns, namely many-to-many personalized communication, by replacing them with simple shift operations. These operations have an efficient implementation on most of interconnection networks and thus our techniques can be adapted to other networks as well. With regard to the employed multiprocessor we make no specific assumption about the amount of local memory residing in each processor. Instead, we introduce a free parameter S and assume that each processor has O(S) local memory. By doing so, we handle all the cases of modern multiprocessors, that is fine-grained, medium-grained and coarse-grained multiprocessors and thus our design is quite general

    An Experimental Analysis of Parallel Sorting Algorithms

    Full text link

    Deterministic and adaptive routing algorithms for mesh-connected computers

    Get PDF
    The two-dimensional mesh topology has been widely used in many multicomputer systems, such as the AMETEK Series 2010, Illiac IV, MPP, DAP, MasPar MP-1 and Intel Paragon. Its major advantages are its excellent scalability and simplicity. New generation multicomputer uses a switching technique called wormhole routing. The essential idea of wormhole routing is to advance a packet directly from incoming to outgoing channel without sorting it, as soon as enough information has been received in the packet header to select the outgoing channel. It has advantages of low latency and low error rate. The problems addressed by this thesis are to evaluate existing routing algorithms for the 2D mesh based on the wormhole model and to design a new routing algorithm that performs better from existing algorithms. In this thesis, the performance of both deterministic and adaptive algorithms, as functions of network size, router buffer size, packet length, is evaluated by computer simulation under different traffic model. Also, a new algorithm, called the west-north-first algorithm, is proposed and tested. It contains both characteristics of deterministic and adaptive algorithm, and hence has a better overall performance under various network traffic models. The results of this study can be applied to the design of parallel processing network system

    Privaatsust säilitavad paralleelarvutused graafiülesannete jaoks

    Get PDF
    Turvalisel mitmeosalisel arvutusel põhinevate reaalsete privaatsusrakenduste loomine on SMC-protokolli arvutusosaliste ümmarguse keerukuse tõttu keeruline. Privaatsust säilitavate tehnoloogiate uudsuse ja nende probleemidega kaasnevate suurte arvutuskulude tõttu ei ole paralleelseid privaatsust säilitavaid graafikualgoritme veel uuritud. Graafikalgoritmid on paljude arvutiteaduse rakenduste selgroog, nagu navigatsioonisüsteemid, kogukonna tuvastamine, tarneahela võrk, hüperspektraalne kujutis ja hõredad lineaarsed lahendajad. Graafikalgoritmide suurte privaatsete andmekogumite töötlemise kiirendamiseks ja kõrgetasemeliste arvutusnõuete täitmiseks on vaja privaatsust säilitavaid paralleelseid algoritme. Seetõttu esitleb käesolev lõputöö tipptasemel protokolle privaatsuse säilitamise paralleelarvutustes erinevate graafikuprobleemide jaoks, ühe allika lühima tee, kõigi paaride lühima tee, minimaalse ulatuva puu ja metsa ning algebralise tee arvutamise. Need uued protokollid on üles ehitatud kombinatoorsete ja algebraliste graafikualgoritmide põhjal lisaks SMC protokollidele. Nende protokollide koostamiseks kasutatakse ka ühe käsuga mitut andmeoperatsiooni, et vooru keerukust tõhusalt vähendada. Oleme väljapakutud protokollid juurutanud Sharemind SMC platvormil, kasutades erinevaid graafikuid ja võrgukeskkondi. Selles lõputöös kirjeldatakse uudseid paralleelprotokolle koos nendega seotud algoritmide, tulemuste, kiirendamise, hindamiste ja ulatusliku võrdlusuuringuga. Privaatsust säilitavate ühe allika lühimate teede ja minimaalse ulatusega puuprotokollide tegelike juurutuste tulemused näitavad tõhusat meetodit, mis vähendas tööaega võrreldes varasemate töödega sadu kordi. Lisaks ei ole privaatsust säilitavate kõigi paaride lühima tee protokollide hindamine ja ulatuslik võrdlusuuringud sarnased ühegi varasema tööga. Lisaks pole kunagi varem käsitletud privaatsust säilitavaid metsa ja algebralise tee arvutamise protokolle.Constructing real-world privacy applications based on secure multiparty computation is challenging due to the round complexity of the computation parties of SMC protocol. Due to the novelty of privacy-preserving technologies and the high computational costs associated with these problems, parallel privacy-preserving graph algorithms have not yet been studied. Graph algorithms are the backbone of many applications in computer science, such as navigation systems, community detection, supply chain network, hyperspectral image, and sparse linear solvers. In order to expedite the processing of large private data sets for graphs algorithms and meet high-end computational demands, privacy-preserving parallel algorithms are needed. Therefore, this Thesis presents the state-of-the-art protocols in privacy-preserving parallel computations for different graphs problems, single-source shortest path (SSSP), All-pairs shortest path (APSP), minimum spanning tree (MST) and forest (MSF), and algebraic path computation. These new protocols have been constructed based on combinatorial and algebraic graph algorithms on top of the SMC protocols. Single-instruction-multiple-data (SIMD) operations are also used to build those protocols to reduce the round complexities efficiently. We have implemented the proposed protocols on the Sharemind SMC platform using various graphs and network environments. This Thesis outlines novel parallel protocols with their related algorithms, the results, speed-up, evaluations, and extensive benchmarking. The results of the real implementations of the privacy-preserving single-source shortest paths and minimum spanning tree protocols show an efficient method that reduced the running time hundreds of times compared with previous works. Furthermore, the evaluation and extensive benchmarking of privacy-preserving All-pairs shortest path protocols are not similar to any previous work. Moreover, the privacy-preserving minimum spanning forest and algebraic path computation protocols have never been addressed before.https://www.ester.ee/record=b555865

    The art of active memory

    Get PDF

    Analyse des synchronisations dans un programme parallèle ordonnancé par vol de travail. Applications à la génération déterministe de nombres pseudo-aléatoires.

    Get PDF
    We present two contributions to the field of parallel programming.The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm.Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity.The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation.As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS.Its main characteristic is the use of a sequential generator that can perform a ``jump-ahead'' directly from one number to another on an arbitrary distance within the pseudorandom sequence.Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a "very parallel" program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare.Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform.The theoretical overhead of Par-R compares favorably to DotMix's overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.Nous présentons deux contributions dans le domaine de la programmation parallèle.La première est théorique : nous introduisons l'analyse SIPS, une approche nouvelle pour dénombrer le nombre d'opérations de synchronisation durant l'exécution d'un algorithme parallèle ordonnancé par vol de travail.Basée sur le concept d'horloges logiques, elle nous permet,: d'une part de donner de nouvelles majorations de coût en moyenne; d'autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité.La seconde contribution est pragmatique: nous présentons une parallélisation générique d'algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l'exécution.Alternative à l'utilisation d'un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS.Sa caractéristique principale est d'exploiter un générateur séquentiel qui peut "sauter" directement d'un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire.Grâce à l'analyse SIPS, nous montrons qu'en moyenne, lors d'une exécution par vol de travail d'un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d'opérations), ces opérations de saut sont rares.Par-R est comparé au générateur pseudo-aléatoire DotMix, écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail.Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement.De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent

    Quantitative performance modeling of scientific computations and creating locality in numerical algorithms

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 141-150) and index.by Sivan Avraham Toledo.Ph.D

    Efficient Oblivious Parallel Sorting on the MasPar MP-1

    No full text
    We address the problem of sorting a large number N of keys on a MasPar MP-1 parallel SIMD machine of moderate size P where the processing elements (PEs) are interconnected as a toroidal mesh and have 16KB local storage each. We present a comparative study of implementations of the following deterministic oblivious sorting methods: Bitonic Sort, Odd-Even Merge Sort, and FastSort. We successfully use the guarded split&merge operation introduced by Rub. The experiments and investigations in a simple, parameterized, analytical model show that, with this operation, from a certain ratio N=P upwards both OddEven Merge Sort and FastSort become faster on average than the up to the present fastest, sophisticated implementation of Bitonic Sort by Prins. Though it is not as efficient as Odd-Even Merge Sort, FastSort is to our knowledge the first method specially tailored to the mesh architecture that can be, when implemented, competitive on average with a mesh-adaptation of Bitonic Sort for large ..

    Efficient Oblivious Parallel Sorting on the MasPar MP-1

    No full text
    We address the problem of sorting a large number N of keys on a MasPar MP-1 parallel SIMD machine of moderate size P where the processing elements (PEs) are interconnected as a toroidal mesh and have 16KB local storage each. We present a comparative study of implementations of the following deterministic oblivious sorting methods: Bitonic Sort, Odd-Even Merge Sort, and FastSort. We successfully use the guarded split&merge operation introduced by R¨ub. The experiments and investigations in a simple, parameterized, analytical model show that, with this operation, from a certain ratio N=P upwards both Odd-Even Merge Sort and FastSort become faster on average than the up to the present fastest, sophisticated implementation of Bitonic Sort by Prins. Though it is not as efficient as Odd-Even Merge Sort, FastSort is to our knowledge the first method specially tailored to the mesh architecture that can be, when implemented, competitive on average with a mesh-adaptation of Bitonic Sort for larg..
    corecore