48 research outputs found

    Parallel Computers and Complex Systems

    Get PDF
    We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

    The nondeterministic divide

    Get PDF
    The nondeterministic divide partitions a vector into two non-empty slices by allowing the point of division to be chosen nondeterministically. Support for high-level divide-and-conquer programming provided by the nondeterministic divide is investigated. A diva algorithm is a recursive divide-and-conquer sequential algorithm on one or more vectors of the same range, whose division point for a new pair of recursive calls is chosen nondeterministically before any computation is performed and whose recursive calls are made immediately after the choice of division point; also, access to vector components is only permitted during activations in which the vector parameters have unit length. The notion of diva algorithm is formulated precisely as a diva call, a restricted call on a sequential procedure. Diva calls are proven to be intimately related to associativity. Numerous applications of diva calls are given and strategies are described for translating a diva call into code for a variety of parallel computers. Thus diva algorithms separate logical correctness concerns from implementation concerns

    Design and resource management of reconfigurable multiprocessors for data-parallel applications

    Get PDF
    FPGA (Field-Programmable Gate Array)-based custom reconfigurable computing machines have established themselves as low-cost and low-risk alternatives to ASIC (Application-Specific Integrated Circuit) implementations and general-purpose microprocessors in accelerating a wide range of computation-intensive applications. Most often they are Application Specific Programmable Circuiits (ASPCs), which are developer programmable instead of user programmable. The major disadvantages of ASPCs are minimal programmability, and significant time and energy overheads caused by required hardware reconfiguration when the problem size outnumbers the available reconfigurable resources; these problems are expected to become more serious with increases in the FPGA chip size. On the other hand, dominant high-performance computing systems, such as PC clusters and SMPs (Symmetric Multiprocessors), suffer from high communication latencies and/or scalability problems. This research introduces low-cost, user-programmable and reconfigurable MultiProcessor-on-a-Programmable-Chip (MPoPC) systems for high-performance, low-cost computing. It also proposes a relevant resource management framework that deals with performance, power consumption and energy issues. These semi-customized systems reduce significantly runtime device reconfiguration by employing userprogrammable processing elements that are reusable for different tasks in large, complex applications. For the sake of illustration, two different types of MPoPCs with hardware FPUs (floating-point units) are designed and implemented for credible performance evaluation and modeling: the coarse-grain MIMD (Multiple-Instruction, Multiple-Data) CG-MPoPC machine based on a processor IP (Intellectual Property) core and the mixed-mode (MIMD, SIMD or M-SIMD) variant-grain HERA (HEterogeneous Reconfigurable Architecture) machine. In addition to alleviating the above difficulties, MPoPCs can offer several performance and energy advantages to our data-parallel applications when compared to ASPCs; they are simpler and more scalable, and have less verification time and cost. Various common computation-intensive benchmark algorithms, such as matrix-matrix multiplication (MMM) and LU factorization, are studied and their parallel solutions are shown for the two MPoPCs. The performance is evaluated with large sparse real-world matrices primarily from power engineering. We expect even further performance gains on MPoPCs in the near future by employing ever improving FPGAs. The innovative nature of this work has the potential to guide research in this arising field of high-performance, low-cost reconfigurable computing. The largest advantage of reconfigurable logic lies in its large degree of hardware customization and reconfiguration which allows reusing the resources to match the computation and communication needs of applications. Therefore, a major effort in the presented design methodology for mixed-mode MPoPCs, like HERA, is devoted to effective resource management. A two-phase approach is applied. A mixed-mode weighted Task Flow Graph (w-TFG) is first constructed for any given application, where tasks are classified according to their most appropriate computing mode (e.g., SIMD or MIMD). At compile time, an architecture is customized and synthesized for the TFG using an Integer Linear Programming (ILP) formulation and a parameterized hardware component library. Various run-time scheduling schemes with different performanceenergy objectives are proposed. A system-level energy model for HERA, which is based on low-level implementation data and run-time statistics, is proposed to guide performance-energy trade-off decisions. A parallel power flow analysis technique based on Newton\u27s method is proposed and employed to verify the methodology

    Predictive and distributed routing balancing (PR-DRB) : high speed interconnection networks

    Get PDF
    Current parallel applications running on clusters require the use of an interconnection network to perform communications among all computing nodes available. Imbalance of communications can produce network congestion, reducing throughput and increasing latency, degrading the overall system performance. On the other hand, parallel applications running on these networks posses representative stages which allow their characterization, as well as repetitive behavior that can be identified on the basis of this characterization. This work presents the Predictive and Distributed Routing Balancing (PR-DRB), a new method developed to gradually control network congestion, based on paths expansion, traffic distribution and effective traffic load, in order to maintain low latency values. PR-DRB monitors messages latencies on intermediate routers, makes decisions about alternative paths and record communication pattern information encountered during congestion situation. Based on the concept of applications repetitiveness, best solution recorded are reapplied when saved communication pattern re-appears. Traffic congestion experiments were conducted in order to evaluate the performance of the method, and improvements were observed.Les aplicacions paral路leles actuals en els Cl煤sters requereixen l'煤s d'una xarxa d'interconnexi贸 per comunicar a tots els nodes de c貌mput disponibles. El desequilibri en la c脿rrega de comunicacions pot congestionar la xarxa, incrementant la lat猫ncia i disminuint el throughput, degradant el rendiment total del sistema. D'altra banda, les aplicacions paral路leles que s'executen sobre aquestes xarxes contenen etapes representatives durant la seva execuci贸 les quals permeten caracteritzar-les, a m茅s d'extraure un comportament repetitiu que pot ser identificat en base a aquesta caracteritzaci贸. Aquest treball presenta el Balanceig Predictiu de Encaminament Distribu茂t (PR-DRB), un nou m猫tode desenvolupat per controlar la congesti贸 a la xarxa en forma gradual, basat en l'expansi贸 de camins, la distribuci贸 de tr脿nsit i c脿rrega efectiva actual per tal de mantenir una lat猫ncia baixa. PR-DRB monitoritza la lat猫ncia dels missatges en els encaminadors, pren decisions sobre els camins alternatius a utilitzar i registra la informaci贸 de la congesti贸 sobre la base del patr贸 de comunicacions detectat, utilitzant com a concepte base la repetitivitat de les aplicacions per despr茅s tornar a aplicar la millor soluci贸 quan aquest patr贸 es repeteixi. Experiments de tr脿nsit amb congesti贸 van ser portats a terme per avaluar el rendiment del m猫tode, els quals van mostrar la bondat del mateix.Las aplicaciones paralelas actuales en los Cl煤steres requieren el uso de una red de interconexi贸n para comunicar a todos los nodos de c贸mputo disponibles. El desbalance en la carga de comunicaciones puede congestionar la red, incrementando la latencia y disminuyendo el throughput, degradando el rendimiento total del sistema. Por otro lado, las aplicaciones paralelas que corren sobre estas redes contienen etapas representativas durante su ejecuci贸n las cuales permiten caracterizarlas, adem谩s de un comportamiento repetitivo que puede ser identificado en base a dicha caracterizaci贸n. Este trabajo presenta el Balanceo Predictivo de Encaminamiento Distribuido (PR-DRB), un nuevo m茅todo desarrollado para controlar la congesti贸n en la red en forma gradual; basado en la expansi贸n de caminos, la distribuci贸n de tr谩fico y carga efectiva actual, a fin de mantener una latencia baja. PR-DRB monitorea la latencia de los mensajes en los encaminadores, toma decisiones sobre los caminos alternativos a utilizar y registra la informaci贸n de la congesti贸n en base al patr贸n de comunicaciones detectado, usando como concepto base la repetitividad de las aplicaciones para luego volver a aplicar la mejor soluci贸n cuando dicho patr贸n se repita. Experimentos de tr谩fico con congesti贸n fueron llevados a cabo para evaluar el rendimiento del m茅todo, los cuales mostraron la bondad del mismo

    Computer vision algorithms on reconfigurable logic arrays

    Full text link

    Achieving parallel performance in scientific computations

    Get PDF

    A Comparison of Load Balancing Algorithms for Parallel Computations

    Get PDF
    Three physical optimization methods are considered in this paper for load balancing parallel computations. These are simulated annealing, genetic algorithms, and neural networks. Some design choices and the inclusion of additional steps lead to new versions of the algorithms with different solution qualities and execution times. The performances of these versions are critically evaluated and compared for test cases with different topologies and sizes. Orthogonal recursive coordinate bisection is also included in the comparison as a typical simple deterministic method. Simulation results show that the algorithms have diverse properties. Hence, different algorithms can be applied to different problems and requirements. For example, the annealing and genetic algorithms produce better solutions and do not show a bias towards particular problem structures. But, they are slower than the neural network and recursive bisection. Preprocessing graph contraction is one of the additional steps suggested for the physical methods. It produces a significant reduction in execution time, which is necessary for their applicability to large-scale problems
    corecore