100 research outputs found

    Shared memory with hidden latency on a family of mesh-like networks

    Get PDF

    Parallel computing in combinatorial optimization

    Get PDF

    Efficient parallel computation on multiprocessors with optical interconnection networks

    Get PDF
    This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks

    Progress Report : 1991 - 1994

    Get PDF

    Approaches to parallel performance prediction

    Get PDF

    Efficient Execution of Sequential Instructions Streams by Physical Machines

    Get PDF
    Any computational model which relies on a physical system is likely to be subject to the fact that information density and speed have intrinsic, ultimate limits. The RAM model, and in particular the underlying assumption that memory accesses can be carried out in time independent from memory size itself, is not physically implementable. This work has developed in the field of limiting technology machines, in which it is somewhat provocatively assumed that technology has achieved the physical limits. The ultimate goal for this is to tackle the problem of the intrinsic latencies of physical systems by encouraging scalable organizations for processors and memories. An algorithmic study is presented, which depicts the implementation of high concurrency programs for SP and SPE, sequential machine models able to compute direct-flow programs in optimal time. Then, a novel pieplined, hierarchical memory organization is presented, with optimal latency and bandwidth for a physical system. In order to both take full advantage of the memory capabilities and exploit the available instruction level parallelism of the code to be executed, a novel processor model is developed. Particular care is put in devising an efficient information flow within the processor itself. Both designs are extremely scalable, as they are based on fixed capacity and fixed size nodes, which are connected as a multidimensional array. Performance analysis on the resulting machine design has led to the discovery that latencies internal to the processor can be the dominating source of complexity in instruction flow execution, which adds to the effects of processor-memory interaction. A characterization of instruction flows is then developed, which is based on the topology induced by instruction dependences

    Fast Parallel Algorithms for Basic Problems

    Get PDF
    Parallel processing is one of the most active research areas these days. We are interested in one aspect of parallel processing, i.e. the design and analysis of parallel algorithms. Here, we focus on non-numerical parallel algorithms for basic combinatorial problems, such as data structures, selection, searching, merging and sorting. The purposes of studying these types of problems are to obtain basic building blocks which will be useful in solving complex problems, and to develop fundamental algorithmic techniques. In this thesis, we study the following problems: priority queues, multiple search and multiple selection, and reconstruction of a binary tree from its traversals. The research on priority queue was motivated by its various applications. The purpose of studying multiple search and multiple selection is to explore the relationships between four of the most fundamental problems in algorithm design, that is, selection, searching, merging and sorting; while our parallel solutions can be used as subroutines in algorithms for other problems. The research on the last problem, reconstruction of a binary tree from its traversals, was stimulated by a challenge proposed in a recent paper by Berkman et al. ( Highly Parallelizable Problems, STOC 89) to design doubly logarithmic time optimal parallel algorithms because a remarkably small number of such parallel algorithms exist

    Hardware/Software Co-design for Multicore Architectures

    Get PDF
    Siirretty Doriast

    A framework for evaluating the impact of communication on performance in large-scale distributed urban simulations

    Get PDF
    A primary motivation for employing distributed simulation is to enable the execution of large-scale simulation workloads that cannot be handled by the resources of a single stand-alone computing node. To make execution possible, the workload is distributed among multiple computing nodes connected to one another via a communication network. The execution of a distributed simulation involves alternating phases of computation and communication to coordinate the co-operating nodes and ensure correctness of the resulting simulation outputs. Reliably estimating the execution performance of a distributed simulation can be difficult due to non-deterministic execution paths involved in alternating computation and communication operations. However, performance estimates are useful as a guide for the simulation time that can be expected when using a given set of computing resources. Performance estimates can support decisions to commit time and resources to running distributed simulations, especially where significant amounts of funds or computing resources are necessary. Various performance estimation approaches are employed in the distributed computing literature, including the influential Bulk Synchronous Parallel (BSP) and LogP models. Different approaches make various assumptions that render them more suitable for some applications than for others. Actual performance depends on characteristics inherent to each distributed simulation application. An important aspect of these individual characteristics is the dynamic relationship between the communication and computation phases of the distributed simulation application. This work develops a framework for estimating the performance of distributed simulation applications, focusing mainly on aspects relevant to the dynamic relationship between communication and computation during distributed simulation execution. The framework proposes a meta-simulation approach based on the Multi-Agent Simulation (MAS) paradigm. Using the approach proposed by the framework, meta-simulations can be developed to investigate the performance of specific distributed simulation applications. The proposed approach enables the ability to compare various what-if scenarios. This ability is useful for comparing the effects of various parameters and strategies such as the number of computing nodes, the communication strategy, and the workload-distribution strategy. The proposed meta-simulation approach can also aid a search for optimal parameters and strategies for specific distributed simulation applications. The framework is demonstrated by implementing a meta-simulation which is based on case studies from the Urban Simulation domain
    • …
    corecore