7 research outputs found

    A Practical Hierarchical Model of Parallel Computation ll: Binary Tree and FFT Algorithms

    Get PDF
    A companion paper has introduced the Hierarchical PRAM (H-PRAM) model of parallel computation, which achieves a good balance between simplicity of usage and reflectivity of realistic parallel computers. In this paper, we demonstrate the usage of the model by designing and analyzing various algorithms for computing the complete binary tree, and the FFT/butterfly graph. By concentrating on two problems, we are able to demonstrate the results of different combinations of organizational strategies and different types of sub-models of the H-PRAM. The philosophy in algorithm design is to maximize the number of processors P that are efficiently usable with respect to an input size N, and to minimize the inefficiency when efficiency is not possible (when P is too large with respect to N). This can be done because of the H-PRAM\u27s representation of general locality, i.e. both strict and neighborhood locality, and results in algorithms that can efficiently employ more processors (and are thus faster) than algorithms for models that only represent strict locality

    Optimal broadcast on parallel locality models

    Get PDF
    AbstractIn this paper matching upper and lower bounds for broadcast on general purpose parallel computation models that exploit network locality are proven. These models try to capture both the general purpose properties of models like the PRAM or BSP on the one hand, and to exploit network locality of special purpose models like meshes, hypercubes, etc., on the other hand. They do so by charging a cost l(|i−j|) for a communication between processors i and j, where l is a suitably chosen latency function.An upper bound T(p)=∑i=0loglogp2i·l(p1/2i) on the runtime of a broadcast on a p processor H-PRAM is given, for an arbitrary latency function l(k).The main contribution of the paper is a matching lower bound, holding for all latency functions in the range from l(k)=Ω(logk/loglogk) to l(k)=O(log2k). This is not a severe restriction since for latency functions l(k)=O(logk/log1+εlog(k)) with arbitrary ε>0, the runtime of the algorithm matches the trivial lower bound Ω(logp) and for l(k)=Θ(log1+εk) or l(k)=Θ(kε), the runtime matches the other trivial lower bound Ω(l(p)). Both upper and lower bounds apply for other parallel locality models like Y-PRAM, D-BSP and E-BSP, too

    A Practical Hierarchial Model of Parallel Computation: The Model

    Get PDF
    We introduce a model of parallel computation that retains the ideal properties of the PRAM by using it as a sub-model, while simultaneously being more reflective of realistic parallel architectures by accounting for and providing abstract control over communication and synchronization costs. The Hierarchical PRAM (H-PRAM) model controls conceptual complexity in the face of asynchrony in two ways. First, by providing the simplifying assumption of synchronization to the design of algorithms, but allowing the algorithms to work asynchronously with each other; and organizing this control asynchrony via an implicit hierarchy relation. Second, by allowing the restriction of communication asynchrony in order to obtain determinate algorithms (thus greatly simplifying proofs of correctness). It is shown that the model is reflective of a variety of existing and proposed parallel architectures, particularly ones that can support massive parallelism. Relationships to programming languages are discussed. Since the PRAM is a sub-model, we can use PRAM algorithms as sub-algorithms in algorithms for the H-PRAM; thus results that have been established with respect to the PRAM are potentially transferable to this new model. The H-PRAM can be used as a flexible tool to investigate general degrees of locality (“neighborhoods of activity) in problems, considering communication and synchronization simultaneously. This gives the potential of obtaining algorithms that map more efficiently to architectures, and of increasing the number of processors that can efficiently be used on a problem (in comparison to a PRAM that charges for communication and synchronization). The model presents a framework in which to study the extent that general locality can be exploited in parallel computing. A companion paper demonstrates the usage of the H-PRAM via the design and analysis of various algorithms for computing the complete binary tree and the FFT/butterfly graph

    High-performance computing for vision

    Get PDF
    Vision is a challenging application for high-performance computing (HPC). Many vision tasks have stringent latency and throughput requirements. Further, the vision process has a heterogeneous computational profile. Low-level vision consists of structured computations, with regular data dependencies. The subsequent, higher level operations consist of symbolic computations with irregular data dependencies. Over the years, many approaches to high-speed vision have been pursued. VLSI hardware solutions such as ASIC's and digital signal processors (DSP's) have provided good processing speeds on structured low-level vision tasks. Special purpose systems for vision have also been designed. Currently, there is growing interest in using general purpose parallel systems for vision problems. These systems offer advantages of higher performance, sofavare programmability, generality, and architectural flexibility over the earlier approaches. The choice of low-cost commercial-off-theshelf (COTS) components as building blocks for these systems leads to easy upgradability and increased system life. The main focus of the paper is on effectively using the COTSbased general purpose parallel computing platforms to realize high-speed implementations of vision tasks. Due to the successful use of the COTS-based systems in a variety of high performance applications, it is attractive to consider their use for vision applications as well. However, the irregular data dependencies in vision tasks lead to large communication overheads in the HPC systems. At the University of Southern California, our research efforts have been directed toward designing scalable parallel algorithms for vision tasks on the HPC systems. In our approach, we use the message passing programming model to develop portable code. Our algorithms are specified using C and MPI. In this paper, we summarize our efforts, and illustrate our approach using several example vision tasks. To facilitate the analysis and development of scalable algorithms, a realistic computational model of the parallel system must be used. Several such models have been proposed in the literature. We use the General-purpose Distributed Memory (GDM) model which is a simple but realistic model of state-of-theart parallel machines. Using the GDM model, generic algorithmic techniques such as data remapping, overlapping of communication with computation, message packing, asynchronous execution, and communication scheduling are developed. Using these techniques, we have developed scalable algorithms for many vision tasks. For instance, a scalable algorithm for linear approximation has been developed using the asynchronous execution technique. Using this algorithm, linear feature extraction can be performed in 0.065 s on a 64 node SP-2 for a 512 × 512 image. A serial implementation takes 3.45 s for the same task. Similarly, the communication scheduling and decomposition techniques lead to a scalable algorithm for the line grouping task. We believe that such an algorithmic approach can result in the development of scalable and portable solutions for vision tasks. © 1996 IEEE Publisher Item Identifier S 0018-9219(96)04992-4.published_or_final_versio

    Engineering the performance of parallel applications

    Get PDF

    Um estudo sobre modelos de computação paralela

    Get PDF
    Orientador: João Carlos SetubalDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Ciencia da ComputaçãoResumo: Modelos de Computação são uma ferramenta muito importante para um bom desenvolvimento de algoritmos. Em geral, eles visam facilitar o trabalho de projetistas abstraindo diversos fatores existentes nas máquinas reais. Em computação paralela, a necessidade de um modelo é extrema devido a grande variedade de arquiteturas. O surgimento de um modelo de computação paralela poderia impulsionar ainda mais o crescimento da área que já é bastante acentuado, devido a limitações físicas existentes em computadores seqüenciais. Nesta dissertação fazemos um estudo de modelos de computação paralela sob o ponto de vista de projeto de algoritmos e com enfoque na computação paralela derivada da arquitetura de von Neumann. Para tanto, começamos por estudar um conjunto de máquinas paralelas para que suas diferenças fiquem claras. Escolhemos as máquinas paralelas mais conhecidas, ou mais difundidas, como: CM-2, Sequent Symmetry, MasPar MP-l, CM-5, entre outras. Após este estudo de máquinas, partimos diretamente para os modelos de computação paralela. Escolhemos três como base. Tais modelos apresentam características bem distintas quanto a simplicidade e realismo. Os modelos estudados são PRAM, BSP [Val90] e LogP [CKP+93]. Muitos defendem que continuemos usando o modelo PRAM , pois este, apesar de ser muito abstrato, facilita bastante o trabalho dos projetistas. A proposta do modelo BSP é um pouco mais ousada pois Valiant tenta, com seu modelo, influenciar as áreas de hardware e software da mesma forma que a arquitetura von Neumann fez com a computação seqüencial. Já a proposta do modelo LogP é bastante imediatista, visto que tenta resolver o problema atual de dificuldade de projeto de algoritmos. Para que pudéssemos avaliar um modelo sob o ponto de vista de projeto de algoritmos, fizemos um estudo de casos com os problemas de Transformada de Fourier e Eliminação de Gauss. Com este estudo de casos pudemos avaliar quão fácil ou difícil é projetar algoritmos em cada um dos modelos.Abstract: Models of Computation are one of most important tools in algorithm design With these models, the work of an algorithm designer becomes easier, because these models leave out many characteristics of real machines. In parallel computing there is a great need for a general model, because we have many different parallel machines. The advent of a parallel computing model could make the area grow more than it is already growing. In this dissertation we study some parallel computing models. First we take a look at a representative set of parallel machines, in order to learn the differences between each architecture. Our set of machines contains some of the most important commercial machines such as: CM-2, Sequent Symmetry, MasPar MP-l and CM-5. After this, we study the models themselves. The models chosen were: PRAM, BSP [Va190] and LogP [CKP+93]. Many researchers argue that the PRAM is the best model for algorithm design although it is not realistic. The proposal of the BSP mo dei is bold, since it also seeks to influence parallel architecture design The proposal of LogP model although similar to the BSP, does not require parallel machines to have synchronization mechanisms. This makes LogP the most realistic but also the most difficult model to use. We evaluate these models based on the problems of Fourier Transform and Gaussian Elimination. After this study we made an evaluation of the three models.MestradoMestre em Ciência da Computaçã

    Diseño e implementación de lenguajes orientados al modelo PRAM

    Get PDF
    Un lenguaje orientado al modelo Pram requiere la posibilidad de distinguir entre variables compartidas y privadas. También requiere un modelo de sincronización por barreras implícitas. Otro elemento indispensable es el paralelismo anidado, ofreciendo la posibilidad de combinar paralelismo y recursividad. Se introduce el modelo pram y sus variantes y se discuten los dos lenguajes orientados al modelo Pram que reúnen las condiciones mencionadas: Fork y 11. Se estudia el lenguaje Fork y su sucesor Fork95, desarrollados en Alemania, su sintaxis, semántica y diferentes realizaciones. También se establecen comparaciones entre ambas aproximaciones (11 y Fork). Nuevas propuestas, tanto en la sintaxis del lenguaje 11 como en el entorno en tiempo de ejecución, van destinadas a hacer más eficiente la ejecución del código traducido
    corecore