298 research outputs found

    Sublogarithmic deterministic selection on arrays with a reconfigurable optical bus

    Get PDF
    The linear array with a reconfigurable pipelined bus system (LARPBS) is a newly introduced parallel computational model, where processors are connected by a reconfigurable optical bus. In this paper, we show that the selection problem can be solved on the LARPBS model deterministically in O((loglogN)2/ log log log N) time. To our best knowledge, this is the best deterministic selection algorithm on any model with a reconfigurable optical bus.Yijie Han, Yi Pan and Hong She

    Simulating a Pipelined Reconfigurable Mesh on a Linear Array with a Reconfigurable Pipelined Bus System

    Get PDF
    Due to the unidirectional nature of propagation and predictable delays, optically pipelined buses have been gaining more attention. There have been many models proposed over time that use reconfigurable optically pipelined buses. The reconfigurable nature of the models makes them capable of changing their component’s functionalities and structure that connects the components at every step of computation. There are both one dimensional as well as k –dimensional models that have been proposed in the literature. Though equivalence between various one dimensional models and equivalence between different two dimensional models had been established, so far there has not been any attempt to explore the relationship between a one dimensional model and a two dimensional model. In the proposed research work it is shown that a move from one to two or more dimensions does not cause any increase in the volume of communication between the processors as they communicate in a pipelined manner on the same optical bus. When moving from two dimensions to one dimension, the challenge is to map the processors so that those belonging to a two-dimensional bus segment are contiguous and in the same order on the one-dimensional model. This does not increase any increase in communication overhead as the processors instead of communicating on two dimensional buses now communicate on a linear one dimensional bus structure. To explore the relationship between one dimensional and two dimensional models a commonly used model Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) and its two dimensional counterpart Pipelined Reconfigurable Mesh (PR-Mesh) are chosen Here an attempt has been made to present a simulation of a two dimensional PR-Mesh on a one dimensional LARPBS to establish complexity of the models with respect to one another, and to determine the efficiency with which the LARPBS can simulate the PR-Mesh

    Scaling Simulations of Reconfigurable Meshes.

    Get PDF
    This dissertation deals with reconfigurable bus-based models, a new type of parallel machine that uses dynamically alterable connections between processors to allow efficient communication and to perform fast computations. We focus this work on the Reconfigurable Mesh (R-Mesh), one of the most widely studied reconfigurable models. We study the ability of the R-Mesh to adapt an algorithm instance of an arbitrary size to run on a given smaller model size without significant loss of efficiency. A scaling simulation achieves this adaptation, and the simulation overhead expresses the efficiency of the simulation. We construct a scaling simulation for the Fusing-Restricted Reconfigurable Mesh (FR-Mesh), an important restriction of the R-Mesh. The overhead of this simulation depends only on the simulating machine size and not on the simulated machine size. The results of this scaling simulation extend to a variety of concurrent write rules and also translate to an improved scaling simulation of the R-Mesh itself. We present a bus linearization procedure that transforms an arbitrary non-linear bus configuration of an R-Mesh into an equivalent acyclic linear bus configuration implementable on an Linear Reconfigurable Mesh (LR-Mesh), a weaker version of the R-Mesh. This procedure gives the algorithm designer the liberty of using buses of arbitrary shape, while automatically translating the algorithm to run on a simpler platform. We illustrate our bus linearization method through two important applications. The first leads to a faster scaling simulation of the R-Mesh. The second application adapts algorithms designed for R-Meshes to run on models with pipelined optical buses. We also present a simulation of a Directional Reconfigurable Mesh (DR-Mesh) on an LR-Mesh. This simulation has a much better efficiency compared to previous work. In addition to the LR-Mesh, this simulation also runs on models that use pipelined optical buses

    Efficient parallel computation on multiprocessors with optical interconnection networks

    Get PDF
    This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks

    Design and Analysis of Optical Interconnection Networks for Parallel Computation.

    Get PDF
    In this doctoral research, we propose several novel protocols and topologies for the interconnection of massively parallel processors. These new technologies achieve considerable improvements in system performance and structure simplicity. Currently, synchronous protocols are used in optical TDM buses. The major disadvantage of a synchronous protocol is the waste of packet slots. To offset this inherent drawback of synchronous TDM, a pipelined asynchronous TDM optical bus is proposed. The simulation results show that the performance of the proposed bus is significantly better than that of known pipelined synchronous TDM optical buses. Practically, the computation power of the plain TDM protocol is limited. Various extensions must be added to the system. In this research, a new pipelined optical TDM bus for implementing a linear array parallel computer architecture is proposed. The switches on the receiving segment of the bus can be dynamically controlled, which make the system highly reconfigurable. To build large and scalable systems, we need new network architectures that are suitable for optical interconnections. A new kind of reconfigurable bus called segmented bus is introduced to achieve reduced structure simplicity and increased concurrency. We show that parallel architectures based on segmented buses are versatile by showing that it can simulate parallel communication patterns supported by a wide variety of networks with small slowdown factors. New kinds of interconnection networks, the hypernetworks, have been proposed recently. Compared with point-to-point networks, they allow for increased resource-sharing and communication bandwidth utilization, and they are especially suitable for optical interconnects. One way to derive a hypernetwork is by finding the dual of a point-to-point network. Hypercube Q\sb{n}, where n is the dimension, is a very popular point-to-point network. It is interesting to construct hypernetworks from the dual Q\sbsp{n}{*} of hypercube of Q\sb{n}. In this research, the properties of Q\sbsp{n}{*} are investigated and a set of fundamental data communication algorithms for Q\sbsp{n}{*} are presented. The results indicate that the Q\sbsp{n}{*} hypernetwork is a useful and promising interconnection structure for high-performance parallel and distributed computing systems

    Simulations and Algorithms on Reconfigurable Meshes With Pipelined Optical Buses.

    Get PDF
    Recently, many models using reconfigurable optically pipelined buses have been proposed in the literature. A system with an optically pipelined bus uses optical waveguides, with unidirectional propagation and predictable delays, instead of electrical buses to transfer information among processors. These two properties enable synchronized concurrent access to an optical bus in a pipelined fashion. Combined with the abilities of the bus structure to broadcast and multicast, this architecture suits many communication-intensive applications. We establish the equivalence of three such one-dimensional optical models, namely the LARPBS, LPB, and POB. This implies an automatic translation of algorithms (without loss of speed or efficiency) among these models. In particular, since the LPB is the same as an LARPBS without the ability to segment its buses, their equivalence establishes reconfigurable delays (rather than segmenting ability) as the key to the power of optically pipelined models. We also present simulations for a number of two-dimensional optical models and establish that they possess the same complexity, so that any of these models can simulate a step of one of the other models in constant time with a polynomial increase in size. Specifically, we determine the complexity of three two-dimensional optical models (the PR-Mesh, APPBS, and AROB) to be the same as the well known LR-Mesh and the cycle-free LR-Mesh. We develop algorithms for the LARPBS and PR-Mesh that are more efficient than existing algorithms in part by exploiting the pipelining, segmenting, and multicasting characteristics of these models. We also consider the implications of certain physical constraints placed on the system by restricting the distance over which two processors are able to communicate. All algorithms developed for these models assume that a healthy system is available. We present some fundamental algorithms that are able to tolerate up to N/2 faults on an N-processor LARPBS. We then extend these results to apply to other algorithms in the areas of image processing and matrix operations

    Scheduling and reconfiguration of interconnection network switches

    Get PDF
    Interconnection networks are important parts of modern computing systems, facilitating communication between a system\u27s components. Switches connecting various nodes of an interconnection network serve to move data in the network. The switch\u27s delay and throughput impact the overall performance of the network and thus the system. Scheduling efficient movement of data through a switch and configuring the switch to realize a schedule are the main themes of this research. We consider various interconnection network switches including (i) crossbar-based switches, (ii) circuit-switched tree switches, and (iii) fat-tree switches. For crossbar-based input-queued switches, a recent result established that logarithmic packet delay is possible. However, this result assumes that packet transmission time through the switch is no less than schedule-generation time. We prove that without this assumption (as is the case in practice) packet delay becomes linear. We also report results of simulations that bear out our result for practical switch sizes and indicate that a fast scheduling algorithm reduces not only packet delay but also buffer size. We also propose a fast mesh-of-trees based distributed switch scheduling (maximal-matching based) algorithm that has polylog complexity. A circuit-switched tree (CST) can serve as an interconnect structure for various computing architectures and models such as the self-reconfigurable gate array and the reconfigurable mesh. A CST is a tree structure with source and destination processing elements as leaves and switches as internal nodes. We design several scheduling and configuration algorithms that distributedly partition a given set of communications into non-conflicting subsets and then establish switch settings and paths on the CST corresponding to the communications. A fat-tree is another widely used interconnection structure in many of today\u27s high-performance clusters. We embed a reconfigurable mesh inside a fat-tree switch to generate efficient connections. We present an R-Mesh-based algorithm for a fat-tree switch that creates buses connecting input and output ports corresponding to various communications using that switch

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    Performance Improvement of SIMD Processor for High-Speed end Devices in IoT Operation Based on Reversible Logic with Hybrid Adder Configuration

    Get PDF
    The reversible logic function is gaining significant consideration as a style for the logic design by implementing modern Nano and quantum computing with minimal impact on physical entropy. Recent advances in reversible logic allow for computer design applications using advanced quantum computer algorithms. In the literature, significant contributions have been made towards reversible logic gate structures and arithmetic units. However, there are many attempts to dictate the design of Single Instruction-Multiple Data (SIMD) processors. In this research work, a novel programmable reversible logic gate design is verified and a reversible processor design suggests its implementation of SIMD processor. Then, implementing the ripple-carry, carry-select and Kogge-Stone carry look-ahead adders using reversible logic and the performance is compared. The proposed reversible logic-based architecture has a minimum fan out with binary tree structure and minimum logic depth. The simulation result of the proposed design is obtained from Xilinx 14.5 software. From the simulated result, the computational path net delay for 16 × 16 reversible logic with Kogge Stone Adder is 17.247 ns. Compared with 16-bit Kogge Stone Adder, the reversible logic-based 16-bit Kogge Stone Adder gives low power and low time delay. By looking at the speed, energy and area parameters, including fast applications in which two smaller delay and low power adders are required, the effectiveness, including the proper area use of the hybrid adder recommended by it is evaluated

    Efficient Algorithms for a Mesh-Connected Computer with Additional Global Bandwidth

    Full text link
    This thesis shows that adding additional global bandwidths to a mesh-connected computer can greatly improve the performance. The goal of this project is to design algorithms for mesh-connected computers augmented with limited global bandwidth, so that we can further enhance our understanding of the parallel/serial nature of the problems on evolving parallel architectures. We do this by first solving several problems associated with fundamental data movement, then summarize ways to resolve different situations one may observe in data movement in parallel computing. This can help us to understand whether the problem is easily parallelizable on different parallel models. We give efficient algorithms to solve several fundamental problems, which include sorting, counting, fast Fourier transform, finding a minimum spanning tree, finding a convex hull, etc. We show that adding a small amount of global bandwidth makes a practical design that combines aspects of mesh and fully connected models to achieve the benefits of each. Most of the algorithms are optimal. For future work, we believe that algorithms with peak-power constrains can make our model well adapted to the recent architectures in high performance computing.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/150001/1/anyujie_1.pd
    corecore