Search CORE

15 research outputs found

An O(1) time complexity software barrier

Author: Carter John B.
Cheng Liqun
Publication venue: University of Utah
Publication date: 01/01/2004
Field of study

technical reportAs network latency rapidly approaches thousands of processor cycles and multiprocessors systems become larger and larger, the primary factor in determining a barrier algorithm?s performance is the number of serialized network latencies it requires. All existing barrier algorithms require at least O(log N) round trip message latencies to perform a single barrier operation on an N-node shared memory multiprocessor. In addition, existing barrier algorithms are not well tuned in terms of how they interact with modern shared memory systems, which leads to an excessive number of message exchanges to signal barrier completion. The contributions of this paper are threefold. First, we identify and quantitatively analyze the performance deficiencies of conventional barrier implementations when they are executed on real (non-idealized) hardware. Second, we propose a queue-based barrier algorithm that has effectively O(1)time complexity as measured in round trip message latencies. Third, by exploiting a hardware write-update (PUT) mechanism for signaling, we demonstrate how carefully matching the barrier implementation to the way that modern shared memory systems operate can improve performance dramatically. The resulting optimized algorithm only costs one round trip message latency to perform a barrier operation across N processors. Using a cycle-accurate execution-driven simulator of a future-generation SGI multiprocessor, we show that the proposed queue-based barrier outperforms conventional barrier implementations based on load-linked/storeconditional instructions by a factor of 5.43 (on 4 processors) to 93.96 (on 256 processors)

The University of Utah: J. Willard Marriott Digital Library

Destination-based HoL blocking elimination

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Crossref

The nondeterministic divide

Author: Charlesworth Arthur
Publication venue
Publication date
Field of study

The nondeterministic divide partitions a vector into two non-empty slices by allowing the point of division to be chosen nondeterministically. Support for high-level divide-and-conquer programming provided by the nondeterministic divide is investigated. A diva algorithm is a recursive divide-and-conquer sequential algorithm on one or more vectors of the same range, whose division point for a new pair of recursive calls is chosen nondeterministically before any computation is performed and whose recursive calls are made immediately after the choice of division point; also, access to vector components is only permitted during activations in which the vector parameters have unit length. The notion of diva algorithm is formulated precisely as a diva call, a restricted call on a sequential procedure. Diva calls are proven to be intimately related to associativity. Numerous applications of diva calls are given and strategies are described for translating a diva call into code for a variety of parallel computers. Thus diva algorithms separate logical correctness concerns from implementation concerns

NASA Technical Reports Server

Dynamic diffracting trees

Author: Della-Libera Giovanni M. (Giovanni Moises)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1997
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 109-111).by Giovanni M. Della-Libera.M.Eng

DSpace@MIT

Design and simulation of an MIMD shared memory multiprocessor with interleaved instruction streams

Author: Stiemerling Thomas R.
Publication venue: The University of Edinburgh
Publication date: 01/01/1991
Field of study

Edinburgh Research Archive

On Collective Communication and Notified Read in the Global Address Space Programming Interface (GASPI)

Author: End Vanessa
Publication venue
Publication date: 14/12/2016
Field of study

Georg-August-University Göttingen

MPG.PuRe

On Design and Applications of Practical Concurrent Data Structures

Author: Walulya Ivan
Publication venue
Publication date: 01/01/2018
Field of study

The proliferation of multicore processors is having an enormous impact on software design and development. In order to exploit parallelism available in multicores, there is a need to design and implement abstractions that programmers can use for general purpose applications development. A common abstraction for coordinated access to memory is a concurrent data structure. Concurrent data structures are challenging to design and implement as they are required to be correct, scalable, and practical under various application constraints. In this thesis, we contribute to the design of efficient concurrent data structures, propose new design techniques and improvements to existing implementations. Additionally, we explore the utilization of concurrent data structures in demanding application contexts such as data stream processing.In the first part of the thesis, we focus on data structures that are difficult to parallelize due to inherent sequential bottlenecks. We present a lock-free vector design that efficiently addresses synchronization bottlenecks by utilizing the combining technique. Typical combining techniques are blocking. Our design introduces combining without sacrificing non-blocking progress guarantees. We extend the vector to present a concurrent lock-free unbounded binary heap that implements a priority queue with mutable priorities.In the second part of the thesis, we shift our focus to concurrent search data structures. In order to offer strong progress guarantee, typical implementations of non-blocking search data structures employ a "helping" mechanism. However, helping may result in performance degradation. We propose help-optimality, which expresses optimization in amortized step complexity of concurrent operations. To describe the concept, we revisit the lock-free designs of a linked-list and a binary search tree and present improved algorithms. We design the algorithms without using any language/platform specific constructs; we do not use bit-stealing or runtime type introspection of objects. Thus, our algorithms are portable. We further delve into multi-dimensional data and similarity search. We present the first lock-free multi-dimensional data structure and linearizable nearest neighbor search algorithm. Our algorithm for nearest neighbor search is generic and can be adapted to other data structures.In the last part of the thesis, we explore the utilization of concurrent data structures for deterministic stream processing. We propose solutions to two challenges prevalent in data stream processing: (1) efficient processing on cloud as well as edge devices and (2) deterministic data-parallel processing at high-throughput and low-latency. As a first step, we present a methodology for customization of streaming aggregation on low-power multicore embedded platforms. Then we introduce Viper, a communication module that can be integrated into stream processing engines for the coordination of threads analyzing data in parallel

Chalmers Research

Reactive synchronization algorithms for multiprocessors

Author: Lim Beng-Hong
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 157-162).by Beng-Hong Lim.Ph.D

DSpace@MIT

Run-time thread management for large-scale distributed-memory multiprocessors

Author: Nussbaum Daniel Seth
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1993
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1993.Includes bibliographical references (p. 214-216).by Daniel Nussbaum.Ph.D

DSpace@MIT

Efficient Q. S support for higt-performance interconnects

Author: Martínez Vicente Alejandro
Publication venue: Ediciones de la Universidad de Castilla-La Mancha
Publication date: 01/01/2008
Field of study

Las redes de interconexión son un componente clave en un gran número de sistemas. Los mecanismos de calidad de servicio (qos) son responsables de asegurar que se alcanza un cierto rendimiento en la red. Las soluciones tradicionales para ofrecer qos en redes de interconexión de altas prestaciones normalmente se basan en arquitecturas complejas. El principal objetivo de esta tesis es investigar si podemos ofrecer mecanismos eficientes de qos. Nuestro propósito es alcanzar un soporte completo de qos con el mínimo de recursos. Para ello, se identifican redundancias en los mecanismos propuestos de qos y son eliminados sin afectar al rendimiento. Esta tesis consta de tres partes. En la primera comenzamos con las propuestas tradicionales de qos a nivel de clase de tráfico. En la segunda parte, proponemos como adaptar los mecanismos de qos basados en deadlines para redes de interconexión de altas prestaciones. Por último, también investigamos la interacción de los mecanismos de qos con el control de congestión

Universidad de Castilla-La Mancha: Repositorio Universitario Institucional de Recursos Abiertos (RUIdeRA)