771 research outputs found

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    Static Mapping of Functional Programs: An Example in Signal Processing

    Get PDF

    Time adaptation for parallel applications in unbalanced time sharing environment

    Get PDF
    Time adaptation is very significant for parallel jobs running on a parallel centralized or distributed multiprocessor machine. The turnaround time of an individual job depends on the turnaround time of each of its processes. Dynamic load balancing for unbalanced time sharing environment helps to equally distribute the work load among the available resources, so that all processes of a single job end almost at the same time, thus minimizing the turnaround time and maximizing the resource utilization. In this thesis we propose and implement an approach that helps parallel applications to use our library so that it can adapt in time dimension (if running in a time sharing environment) without changing the space allocation. This approach provides an interface between application, monitoring information, the job scheduler and a cost model that considers application, system and load-balancing information. This interface allows binding of different adaptation approaches for synchronous adaptation and semi-static remapping. We also determined job types for what this approach is suitable and at the end we present results from our test run on a 16-node cluster with synthetic MPI programs and a time adaptation approach, demonstrating the gain from our approach. In this work, we make extension of existing ATOP [11] work. We directly use their over partitioning strategy. But unlike ATOP, applications can use our adaptation library and adapt dynamically. We also adopted the dynamic directory concept used in SCOJO [8]. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .A74. Source: Masters Abstracts International, Volume: 44-03, page: 1393. Thesis (M.Sc.)--University of Windsor (Canada), 2005

    Function Shipping in a Scalable Parallel Programming Model

    Get PDF
    Increasingly, a large number of scientific and technical applications exhibit dynamically generated parallelism or irregular data access patterns. These applications pose significant challenges to achieving scalable performance on large scale parallel systems. This thesis explores the advantages of using function shipping as a language level primitive to help simplify writing scalable irregular and dynamic parallel applications. Function shipping provides a mechanism to avoid exposing latency, by enabling users ship data and computation together to a remote worker for execution. In the context of the Coarray Fortran 2.0 Partitioned Global Address Space language, we implement function shipping and the finish synchronization construct, which ensures global completion of a set of shipped function instances. We demonstrate the usability and performance benefits of using function shipping with several benchmarks. Experiments on emerging supercomputers show that function shipping is useful and effective in achieving scalable performance with dynamic and irregular algorithms

    MGSim - Simulation tools for multi-core processor architectures

    Get PDF
    MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on many-core and hardware multithreaded processors. It includes support for core models with different instruction sets, a configurable multi-core interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a many-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.Comment: 33 pages, 22 figures, 4 listings, 2 table

    Constraint programming on hierarchical multiprocessor systems

    Get PDF
    The work reported in this thesis is about constraint processing in the context of hierarchical multiprocessor systems, including distributed systems. More speci cally, it develops techniques and a system to help bringing the power available in today's multiprocessing networked systems into the constraint processing eld. Solving constraint speci ed problems is a process which lends itself naturally to parallelisation, as it usually implies going through very large search spaces, looking for a solution. Parallel constraint solving draws on the idea of dividing the search space among several workers, so the search may proceed faster, and thanks to the declarative nature of constraint programming, the parallelisation happens transparently as far as the user is concerned. However, to fully take advantage of the parallel computing power available, techniques must be developed to help ensure that the workers executing the search are kept busy at all times, which is an issue tackled by this work; RESUMO: Esta tese debruça-se sobre a programação por restrições no contexto dos sistemas multiprocessador hierárquicos, incluindo os sistemas distribuídos. Mais especificamente, o trabalho elaborado desenvolve as técnicas de resolução de problemas de satisfação de restrições recorrendo ao paralelismo. A actualidade do tema prende-se com a cada vez maior divulgação de que são objecto os sistemas multiprocessador que, juntamente com a omnipresença das redes de computadores, põe à nossa disposição uma capacidade de cálculo que necessita de ser posta a uso, o que tarda em acontecer. Nesta tese desenvolve-se um sistema que permite tirar partido desses recursos através do processamento de restrições A programação por restrições é um paradigma declarativo, em que o utilizador não tem de se preocupar com o controlo da computação, e a introdução de paralelismo nesta área pode realizar-se transparentemente. Por outro lado, o processo de pesquisa de soluções para problemas especificados por restrições adapta-se particularmente bem a ser paralelizado. Este tese apresenta uma abordagem _à resolução paralela de restrições, que junta paralelismo local, sob a forma de trabalhadores, com paralelismo distribuído, em que os actores são as equipas. O sistema construído, destinado a sistemas distribuídos de larga escala, que _é descrito e os seus resultados apresentados, inclui distribuição de trabalho, através de roubo de trabalho. Este funciona, localmente, sem a colaboração do roubado e, remotamente, com colaboração, num ambiente em que todas as equipas cooperam na procura da solução

    Operating System Support for Redundant Multithreading

    Get PDF
    Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors
    corecore