178 research outputs found

    Scaling Simulations of Reconfigurable Meshes.

    Get PDF
    This dissertation deals with reconfigurable bus-based models, a new type of parallel machine that uses dynamically alterable connections between processors to allow efficient communication and to perform fast computations. We focus this work on the Reconfigurable Mesh (R-Mesh), one of the most widely studied reconfigurable models. We study the ability of the R-Mesh to adapt an algorithm instance of an arbitrary size to run on a given smaller model size without significant loss of efficiency. A scaling simulation achieves this adaptation, and the simulation overhead expresses the efficiency of the simulation. We construct a scaling simulation for the Fusing-Restricted Reconfigurable Mesh (FR-Mesh), an important restriction of the R-Mesh. The overhead of this simulation depends only on the simulating machine size and not on the simulated machine size. The results of this scaling simulation extend to a variety of concurrent write rules and also translate to an improved scaling simulation of the R-Mesh itself. We present a bus linearization procedure that transforms an arbitrary non-linear bus configuration of an R-Mesh into an equivalent acyclic linear bus configuration implementable on an Linear Reconfigurable Mesh (LR-Mesh), a weaker version of the R-Mesh. This procedure gives the algorithm designer the liberty of using buses of arbitrary shape, while automatically translating the algorithm to run on a simpler platform. We illustrate our bus linearization method through two important applications. The first leads to a faster scaling simulation of the R-Mesh. The second application adapts algorithms designed for R-Meshes to run on models with pipelined optical buses. We also present a simulation of a Directional Reconfigurable Mesh (DR-Mesh) on an LR-Mesh. This simulation has a much better efficiency compared to previous work. In addition to the LR-Mesh, this simulation also runs on models that use pipelined optical buses

    Active Self-Assembly of Algorithmic Shapes and Patterns in Polylogarithmic Time

    Get PDF
    We describe a computational model for studying the complexity of self-assembled structures with active molecular components. Our model captures notions of growth and movement ubiquitous in biological systems. The model is inspired by biology's fantastic ability to assemble biomolecules that form systems with complicated structure and dynamics, from molecular motors that walk on rigid tracks and proteins that dynamically alter the structure of the cell during mitosis, to embryonic development where large-scale complicated organisms efficiently grow from a single cell. Using this active self-assembly model, we show how to efficiently self-assemble shapes and patterns from simple monomers. For example, we show how to grow a line of monomers in time and number of monomer states that is merely logarithmic in the length of the line. Our main results show how to grow arbitrary connected two-dimensional geometric shapes and patterns in expected time that is polylogarithmic in the size of the shape, plus roughly the time required to run a Turing machine deciding whether or not a given pixel is in the shape. We do this while keeping the number of monomer types logarithmic in shape size, plus those monomers required by the Kolmogorov complexity of the shape or pattern. This work thus highlights the efficiency advantages of active self-assembly over passive self-assembly and motivates experimental effort to construct general-purpose active molecular self-assembly systems

    On Fault Resilient Network-on-Chip for Many Core Systems

    Get PDF
    Rapid scaling of transistor gate sizes has increased the density of on-chip integration and paved the way for heterogeneous many-core systems-on-chip, significantly improving the speed of on-chip processing. The design of the interconnection network of these complex systems is a challenging one and the network-on-chip (NoC) is now the accepted scalable and bandwidth efficient interconnect for multi-processor systems on-chip (MPSoCs). However, the performance enhancements of technology scaling come at the cost of reliability as on-chip components particularly the network-on-chip become increasingly prone to faults. In this thesis, we focus on approaches to deal with the errors caused by such faults. The results of these approaches are obtained not only via time-consuming cycle-accurate simulations but also by analytical approaches, allowing for faster and accurate evaluations, especially for larger networks. Redundancy is the general approach to deal with faults, the mode of which varies according to the type of fault. For the NoC, there exists a classification of faults into transient, intermittent and permanent faults. Transient faults appear randomly for a few cycles and may be caused by the radiation of particles. Intermittent faults are similar to transient faults, however, differing in the fact that they occur repeatedly at the same location, eventually leading to a permanent fault. Permanent faults by definition are caused by wires and transistors being permanently short or open. Generally, spatial redundancy or the use of redundant components is used for dealing with permanent faults. Temporal redundancy deals with failures by re-execution or by retransmission of data while information redundancy adds redundant information to the data packets allowing for error detection and correction. Temporal and information redundancy methods are useful when dealing with transient and intermittent faults. In this dissertation, we begin with permanent faults in NoC in the form of faulty links and routers. Our approach for spatial redundancy adds redundant links in the diagonal direction to the standard rectangular mesh topology resulting in the hexagonal and octagonal NoCs. In addition to redundant links, adaptive routing must be used to bypass faulty components. We develop novel fault-tolerant deadlock-free adaptive routing algorithms for these topologies based on the turn model without the use of virtual channels. Our results show that the hexagonal and octagonal NoCs can tolerate all 2-router and 3-router faults, respectively, while the mesh has been shown to tolerate all 1-router faults. To simplify the restricted-turn selection process for achieving deadlock freedom, we devised an approach based on the channel dependency matrix instead of the state-of-the-art Duato's method of observing the channel dependency graph for cycles. The approach is general and can be used for the turn selection process for any regular topology. We further use algebraic manipulations of the channel dependency matrix to analytically assess the fault resilience of the adaptive routing algorithms when affected by permanent faults. We present and validate this method for the 2D mesh and hexagonal NoC topologies achieving very high accuracy with a maximum error of 1%. The approach is very general and allows for faster evaluations as compared to the generally used cycle-accurate simulations. In comparison, existing works usually assume a limited number of faults to be able to analytically assess the network reliability. We apply the approach to evaluate the fault resilience of larger NoCs demonstrating the usefulness of the approach especially compared to cycle-accurate simulations. Finally, we concentrate on temporal and information redundancy techniques to deal with transient and intermittent faults in the router resulting in the dropping and hence loss of packets. Temporal redundancy is applied in the form of ARQ and retransmission of lost packets. Information redundancy is applied by the generation and transmission of redundant linear combinations of packets known as random linear network coding. We develop an analytic model for flexible evaluation of these approaches to determine the network performance parameters such as residual error rates and increased network load. The analytic model allows to evaluate larger NoCs and different topologies and to investigate the advantage of network coding compared to uncoded transmissions. We further extend the work with a small insight to the problem of secure communication over the NoC. Assuming large heterogeneous MPSoCs with components from third parties, the communication is subject to active attacks in the form of packet modification and drops in the NoC routers. Devising approaches to resolve these issues, we again formulate analytic models for their flexible and accurate evaluations, with a maximum estimation error of 7%

    Approaches to the implementation of binary relation inference network.

    Get PDF
    by C.W. Tong.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 96-98).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- The Availability of Parallel Processing Machines --- p.2Chapter 1.1.1 --- Neural Networks --- p.5Chapter 1.2 --- Parallel Processing in the Continuous-Time Domain --- p.6Chapter 1.3 --- Binary Relation Inference Network --- p.10Chapter 2 --- Binary Relation Inference Network --- p.12Chapter 2.1 --- Binary Relation Inference Network --- p.12Chapter 2.1.1 --- Network Structure --- p.14Chapter 2.2 --- Shortest Path Problem --- p.17Chapter 2.2.1 --- Problem Statement --- p.17Chapter 2.2.2 --- A Binary Relation Inference Network Solution --- p.18Chapter 3 --- A Binary Relation Inference Network Prototype --- p.21Chapter 3.1 --- The Prototype --- p.22Chapter 3.1.1 --- The Network --- p.22Chapter 3.1.2 --- Computational Element --- p.22Chapter 3.1.3 --- Network Response Time --- p.27Chapter 3.2 --- Improving Response --- p.29Chapter 3.2.1 --- Removing Feedback --- p.29Chapter 3.2.2 --- Selecting Minimum with Diodes --- p.30Chapter 3.3 --- Speeding Up the Network Response --- p.33Chapter 3.4 --- Conclusion --- p.35Chapter 4 --- VLSI Building Blocks --- p.36Chapter 4.1 --- The Site --- p.37Chapter 4.2 --- The Unit --- p.40Chapter 4.2.1 --- A Minimum Finding Circuit --- p.40Chapter 4.2.2 --- A Tri-state Comparator --- p.44Chapter 4.3 --- The Computational Element --- p.45Chapter 4.3.1 --- Network Performances --- p.46Chapter 4.4 --- Discussion --- p.47Chapter 5 --- A VLSI Chip --- p.48Chapter 5.1 --- Spatial Configuration --- p.49Chapter 5.2 --- Layout --- p.50Chapter 5.2.1 --- Computational Elements --- p.50Chapter 5.2.2 --- The Network --- p.52Chapter 5.2.3 --- I/O Requirements --- p.53Chapter 5.2.4 --- Optional Modules --- p.53Chapter 5.3 --- A Scalable Design --- p.54Chapter 6 --- The Inverse Shortest Paths Problem --- p.57Chapter 6.1 --- Problem Statement --- p.59Chapter 6.2 --- The Embedded Approach --- p.63Chapter 6.2.1 --- The Formulation --- p.63Chapter 6.2.2 --- The Algorithm --- p.65Chapter 6.3 --- Implementation Results --- p.66Chapter 6.4 --- Other Implementations --- p.67Chapter 6.4.1 --- Sequential Machine --- p.67Chapter 6.4.2 --- Parallel Machine --- p.68Chapter 6.5 --- Discussion --- p.68Chapter 7 --- Closed Semiring Optimization Circuits --- p.71Chapter 7.1 --- Transitive Closure Problem --- p.72Chapter 7.1.1 --- Problem Statement --- p.72Chapter 7.1.2 --- Inference Network Solutions --- p.73Chapter 7.2 --- Closed Semirings --- p.76Chapter 7.3 --- Closed Semirings and the Binary Relation Inference Network --- p.79Chapter 7.3.1 --- Minimum Spanning Tree --- p.80Chapter 7.3.2 --- VLSI Implementation --- p.84Chapter 7.4 --- Conclusion --- p.86Chapter 8 --- Conclusions --- p.87Chapter 8.1 --- Summary of Achievements --- p.87Chapter 8.2 --- Future Work --- p.89Chapter 8.2.1 --- VLSI Fabrication --- p.89Chapter 8.2.2 --- Network Robustness --- p.90Chapter 8.2.3 --- Inference Network Applications --- p.91Chapter 8.2.4 --- Architecture for the Bellman-Ford Algorithm --- p.91Bibliography --- p.92Appendices --- p.99Chapter A --- Detailed Schematic --- p.99Chapter A.1 --- Schematic of the Inference Network Structures --- p.99Chapter A.1.1 --- Unit with Self-Feedback --- p.99Chapter A.1.2 --- Unit with Self-Feedback Removed --- p.100Chapter A.1.3 --- Unit with a Compact Minimizer --- p.100Chapter A.1.4 --- Network Modules --- p.100Chapter A.2 --- Inference Network Interface Circuits --- p.100Chapter B --- Circuit Simulation and Layout Tools --- p.107Chapter B.1 --- Circuit Simulation --- p.107Chapter B.2 --- VLSI Circuit Design --- p.110Chapter B.3 --- VLSI Circuit Layout --- p.111Chapter C --- The Conjugate-Gradient Descent Algorithm --- p.113Chapter D --- Shortest Path Problem on MasPar --- p.11

    Systolic Array Implementations With Reduced Compute Time.

    Get PDF
    The goal of the research is the establishment of a formal methodology to develop computational structures more suitable for the changing nature of real-time signal processing and control applications. A major effort is devoted to the following question: Given a systolic array designed to execute a particular algorithm, what other algorithms can be executed on the same array? One approach for answering this question is based on a general model of array operations using graph-theoretic techniques. As a result, a systematic procedure is introduced that models array operations as a function of the compute cycle. As a consequence of the analysis, the dissertation develops the concept of fast algorithm realizations. This concept characterizes specific realizations that can be evaluated in a reduced number of cycles. It restricts the operations to remain in the same class but with reduced execution time. The concept takes advantage of the data dependencies of the algorithm at hand. This feature allows the modification of existing structures by reordering the input data. Applications of the principle allows optimum time band and triangular matrix product on arrays designed for dense matrices. A second approach for analyzing the families of algorithms implementable in an array, is based on the concept of array time constrained operation. The principle uses the number of compute cycle as an additional degree of freedom to expand the class of transformations generated by a single array. A mathematical approach, based on concepts from multilinear algebra, is introduced to model the recursive transformations implemented in linear arrays at each compute cycle. The proposed representation is general enough to encompass a large class of signal processing and control applications. A complete analytical model of the linear maps implementable by the array at each compute cycle is developed. The proposed methodology results in arrays that are more adaptable to the changing nature of operations. Lessons learned from analyzing existing arrays are used to design smart arrays for special algorithm realizations. Applications of the methodology include the design of flexible time structures and the ability to decompose a full size array into subarrays implementing smaller size problems

    RDF: Un modèle de calcul flot de données reconfigurable

    Get PDF
    Dataflow Models of Computation (MoCs) are widely used in embedded systems, including multimedia processing, digital signal processing, telecommunications, and automatic control. In a dataflow MoC, an application is specified as a graph of actors connected by FIFO channels. One of the first and most popular dataflow MoCs, Synchronous Dataflow (SDF), provides static analyses to guarantee boundedness and liveness, which are key properties for embedded systems. However, SDF and most of its variants lacks the capability to express the dynamism needed by modern streaming applications. In particular, the applications mentioned above have a strong need for reconfigurability to accommodate changes in the input data, the control objectives, or the environment. We address this need by proposing a new MoC called Reconfigurable Dataflow (RDF). RDF extends SDF with transformation rules that specify how and when the topology and actors of the graph may be reconfigured. Starting from an initial RDF graph and a set of transformation rules, an arbitrary number of new RDF graphs can be generated at runtime. A key feature of RDF is that it can be statically analyzed to guarantee that all possible graphs generated at runtime will be consistent and live. We introduce the RDF MoC, describe its associated static analyses, and present its implementation and some experimental results.Les modèles de calcul (MoCs) flot de données synchrones sont très utilisés dans les systèmes embarqués et les applications multimédia, de traitement du signal, de télécommunication et de contrôle automatique. Dans ce style de modèle, une application est spécifiée par un graphe d’acteurs connectés par des liens FIFO de communication. Un des MoCs les plus connus, SDF (pour Synchronous Dataflow), permet des analyses statiques qui garantissent l’exécution en mémoire bornée et l’absence d’interblocage, propriétés clés pour les systèmes embarqués. Néanmoins, SDF (et la plupart de ses variantes) ne permet pas d’exprimer la dynamicité requise par les applications embarquées modernes. En particulier, ces applications ont souvent besoin de se reconfigurer pour s’adapter aux changements (par ex., de débit ou de qualité) du flot d’entrée, des objectifs de contrôle ou de l’environnement. Afin de répondre à ce besoin, nous proposons RDF (pour Reconfigurable DataFlow) un MoC qui étend SDF avec des règles de transformations spécifiant comment la topologie du graphe flot de données peut être reconfiguré dynamiquement. En considérant un graphe SDF initial et un ensemble de règles de transformation, un nombre arbitraire de nouveaux graphes peuvent être produits. La principale qualité de RDF est qu’il peut être analysé statiquement pour garantir que tous les graphes générés dynamiquement s’exécuteront en mémoire bornée et sans interblocage. Nous présentons le modèle RDF, les analyses statiques associées, sa mise en oeuvre et quelques expérimentations

    Efficient parallel processing with optical interconnections

    Get PDF
    With the advances in VLSI technology, it is now possible to build chips which can each contain thousands of processors. The efficiency of such chips in executing parallel algorithms heavily depends on the interconnection topology of the processors. It is not possible to build a fully interconnected network of processors with constant fan-in/fan-out using electrical interconnections. Free space optics is a remedy to this limitation. Qualities exclusive to the optical medium are its ability to be directed for propagation in free space and the property that optical channels can cross in space without any interference. In this thesis, we present an electro-optical interconnected architecture named Optical Reconfigurable Mesh (ORM). It is based on an existing optical model of computation. There are two layers in the architecture. The processing layer is a reconfigurable mesh and the deflecting layer contains optical devices to deflect light beams. ORM provides three types of communication mechanisms. The first is for arbitrary planar connections among sets of locally connected processors using the reconfigurable mesh. The second is for arbitrary connections among N of the processors using the electrical buses on the processing layer and N2 fixed passive deflecting units on the deflection layer. The third is for arbitrary connections among any of the N2 processors using the N2 mechanically reconfigurable deflectors in the deflection layer. The third type of communication mechanisms is significantly slower than the other two. Therefore, it is desirable to avoid reconfiguring this type of communication during the execution of the algorithms. Instead, the optical reconfiguration can be done before the execution of each algorithm begins. Determining a right configuration that would be suitable for the entire configuration of a task execution is studied in this thesis. The basic data movements for each of the mechanisms are studied. Finally, to show the power of ORM, we use all three types of communication mechanisms in the first O(logN) time algorithm for finding the convex hulls of all figures in an N x N binary image presented in this thesis

    Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

    Get PDF
    Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences\u27 structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm

    In-Memory Computing Using Formal Methods and Paths-Based Logic

    Get PDF
    The continued scaling of the CMOS device has been largely responsible for the increase in computational power and consequent technological progress over the last few decades. However, the end of Dennard scaling has interrupted this era of sustained exponential growth in computing performance. Indeed, we are quickly reaching an impasse in the form of limitations in the lithographic processes used to fabricate CMOS processes and, even more dire, we are beginning to face fundamental physical phenomena, such as quantum tunneling, that are pervasive at the nanometer scale. Such phenomena manifests itself in prohibitively high leakage currents and process variations, leading to inaccurate computations. As a result, there has been a surge of interest in computing architectures that can replace the traditional CMOS transistor-based methods. This thesis is a thorough investigation of how computations can be performed on one such architecture, called a crossbar. The methods proposed in this document apply to any crossbar consisting of two-terminal connective devices. First, we demonstrate how paths of electric current between two wires can be used as design primitives in a crossbar. We then leverage principles from the field of formal methods, in particular the area of bounded model checking, to automate the synthesis of crossbar designs for computing arithmetic operations. We demonstrate that our approach yields circuits that are state-of-the-art in terms of the number of operations required to perform a computation. Finally, we look at the benefits of using a 3D crossbar for computation; that is, a crossbar consisting of multiple layers of interconnects. A novel 3D crossbar computing paradigm is proposed for solving the Boolean matrix multiplication and transitive closure problems and we show how this paradigm can be utilized, with small modifications, in the XPoint crossbar memory architecture that was recently announced by Intel
    • …
    corecore