51 research outputs found

    Diastolic arrays : throughput-driven reconfigurable computing

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (leaves 67-70).In this thesis, we propose a new reconfigurable computer substrate: diastolic arrays. Diastolic arrays are arrays of processing elements that communicate exclusively through First-In First-Out (FIFO) queues, and provide hardware support to guarantee bandwidth and buffer space for all data transfers. FIFO control implies that a module idles if its input FIFOs are empty, and stalls if its output FIFOs are full. The timing of data transfers between processing elements in diastolic arrays is therefore significantly more relaxed than in systolic arrays or pipelines. All specified data transfers are statically routed, and the routing problem to maximize average throughput can be optimally or near-optimally solved in polynomial time by formulating it as a maximum concurrent multicommodity flow problem and using linear programming. We show that the architecture of diastolic arrays enables efficient synthesis from high-level specifications of communicating finite state machines, providing a high-performance, off-the-shelf computer substrate that can be easily programmed.by Myong Hyon Cho.S.M

    Segment Routing: a Comprehensive Survey of Research Activities, Standardization Efforts and Implementation Results

    Full text link
    Fixed and mobile telecom operators, enterprise network operators and cloud providers strive to face the challenging demands coming from the evolution of IP networks (e.g. huge bandwidth requirements, integration of billions of devices and millions of services in the cloud). Proposed in the early 2010s, Segment Routing (SR) architecture helps face these challenging demands, and it is currently being adopted and deployed. SR architecture is based on the concept of source routing and has interesting scalability properties, as it dramatically reduces the amount of state information to be configured in the core nodes to support complex services. SR architecture was first implemented with the MPLS dataplane and then, quite recently, with the IPv6 dataplane (SRv6). IPv6 SR architecture (SRv6) has been extended from the simple steering of packets across nodes to a general network programming approach, making it very suitable for use cases such as Service Function Chaining and Network Function Virtualization. In this paper we present a tutorial and a comprehensive survey on SR technology, analyzing standardization efforts, patents, research activities and implementation results. We start with an introduction on the motivations for Segment Routing and an overview of its evolution and standardization. Then, we provide a tutorial on Segment Routing technology, with a focus on the novel SRv6 solution. We discuss the standardization efforts and the patents providing details on the most important documents and mentioning other ongoing activities. We then thoroughly analyze research activities according to a taxonomy. We have identified 8 main categories during our analysis of the current state of play: Monitoring, Traffic Engineering, Failure Recovery, Centrally Controlled Architectures, Path Encoding, Network Programming, Performance Evaluation and Miscellaneous...Comment: SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIAL

    A Multiple-objective ILP based Global Routing Approach for VLSI ASIC Design

    Get PDF
    A VLSI chip can today contain hundreds of millions transistors and is expected to contain more than 1 billion transistors in the next decade. In order to handle this rapid growth in integration technology, the design procedure is therefore divided into a sequence of design steps. Circuit layout is the design step in which a physical realization of a circuit is obtained from its functional description. Global routing is one of the key subproblems of the circuit layout which involves finding an approximate path for the wires connecting the elements of the circuit without violating resource constraints. The global routing problem is NP-hard, therefore, heuristics capable of producing high quality routes with little computational effort are required as we move into the Deep Sub-Micron (DSM) regime. In this thesis, different approaches for global routing problem are first reviewed. The advantages and disadvantages of these approaches are also summarized. According to this literature review, several mathematical programming based global routing models are fully investigated. Quality of solution obtained by these models are then compared with traditional Maze routing technique. The experimental results show that the proposed model can optimize several global routing objectives simultaneously and effectively. Also, it is easy to incorporate new objectives into the proposed global routing model. To speedup the computation time of the proposed ILP based global router, several hierarchical methods are combined with the flat ILP based global routing approach. The experimental results indicate that the bottom-up global routing method can reduce the computation time effectively with a slight increase of maximum routing density. In addition to wire area, routability, and vias, performance and low power are also important goals in global routing, especially in deep submicron designs. Previous efforts that focused on power optimization for global routing are hindered by excessively long run times or the routing of a subset of the nets. Accordingly, a power efficient multi-pin global routing technique (PIRT) is proposed in this thesis. This integer linear programming based techniques strives to find a power efficient global routing solution. The results indicate that an average power savings as high as 32\% for the 130-nm technology can be achieved with no impact on the maximum chip frequency

    Application-aware deadlock-free oblivious routing

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 67-71).Systems that can be integrated on a single silicon die have become larger and increasingly complex, and wire designs as communication mechanisms for these systems on chip (SoC) have shown to be a limiting factor in their performance. As an approach to remove the limitation of communication and to overcome wire delays, interconnection networks or Network-on-Chip (NoC) architectures have emerged. NoC architectures enable faster data communication between components and are more scalable. In designing NoC systems, there are three key issues; the topology, which directly depends on packaging technology and manufacturing costs, dictates the throughput and latency bounds of the network; the flit control protocol, which establishes how the network resources are allocated to packets exchanged between components; and finally, the routing algorithm, which aims at optimizing network performance for some topology and flow control protocol by selecting appropriate paths for those packets. Since the routing algorithm sits on top of the other layers of design, it is critical that routing is done in a matter that makes good usage of the resources of the network. Two main approaches to routing, oblivious and adaptive, have been followed in creating routing algorithms for these systems. Each approach has its pros and cons; oblivious routing, as opposite to adaptive routing, uses no network state information in determining routes at the cost of lower performance on certain applications, but has been widely used because of its simpler hardware requirements.(cont.) This thesis examines oblivious routing schemes for NoC architectures. It introduces various non-minimal, oblivious routing algorithms that globally allocate network bandwidth for a given application when estimated bandwidths for data transfers are provided, while ensuring deadlock freedom with no significant additional hardware. The work presents and evaluates these oblivious routing algorithms which attempt to minimize the maximum channel load (MCL) across all network links in an effort to maximize application throughput. Simulation results from popular synthetic benchmarks and concrete applications, such as an H.264 decoder, show that it is possible to achieve better performance than traditional deterministic and oblivious routing schemes.by Michel A. Kinsy.S.M

    Topological optimization of hybrid quantum key distribution networks

    Full text link
    With the growing complexity of quantum key distribution (QKD) network structures, aforehand topology design is of great significance to support a large-number of nodes over a large-spatial area. However, the exclusivity of quantum channels, the limitation of key generation capabilities, the variety of QKD protocols and the necessity of untrusted-relay selection, make the optimal topology design a very complicated task. In this research, a hybrid QKD network is studied for the first time from the perspective of topology, by analyzing the topological differences of various QKD protocols. In addition, to make full use of hybrid networking, an analytical model for optimal topology calculation is proposed, to reach the goal of best secure communication service by optimizing the deployment of various QKD devices and the selection of untrusted-relays under a given cost limit. Plentiful simulation results show that hybrid networking and untrusted-relay selection can bring great performance advantages, and then the universality and effectiveness of the proposed analytical model are verified.Comment: 12 pages, 4 figure

    Logic perturbation based circuit partitioning and optimum FPGA switch-box designs.

    Get PDF
    Cheung Chak Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 101-114).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiVita --- p.vTable of Contents --- p.viList of Figures --- p.xList of Tables --- p.xivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Aims and Contribution --- p.4Chapter 1.3 --- Thesis Overview --- p.5Chapter 2 --- VLSI Design Cycle --- p.6Chapter 2.1 --- Logic Synthesis --- p.7Chapter 2.1.1 --- Logic Minimization --- p.8Chapter 2.1.2 --- Technology Mapping --- p.8Chapter 2.1.3 --- Testability --- p.8Chapter 2.2 --- Physical Design Synthesis --- p.8Chapter 2.2.1 --- Partitioning --- p.9Chapter 2.2.2 --- Floorplanning & Placement --- p.10Chapter 2.2.3 --- Routing --- p.11Chapter 2.2.4 --- "Compaction, Extraction & Verification" --- p.12Chapter 2.2.5 --- Physical Design of FPGAs --- p.12Chapter 3 --- Alternative Wiring --- p.13Chapter 3.1 --- Introduction --- p.13Chapter 3.2 --- Notation and Definitions --- p.15Chapter 3.3 --- Application of Rewiring --- p.17Chapter 3.3.1 --- Logic Optimization --- p.17Chapter 3.3.2 --- Timing Optimization --- p.17Chapter 3.3.3 --- Circuit Partitioning and Routing --- p.18Chapter 3.4 --- Logic Optimization Analysis --- p.19Chapter 3.4.1 --- Global Flow Optimization --- p.19Chapter 3.4.2 --- OBDD Representation --- p.20Chapter 3.4.3 --- Automatic Test Pattern Generation (ATPG) --- p.22Chapter 3.4.4 --- Graph Based Alternative Wiring (GBAW) --- p.23Chapter 3.5 --- Augmented GBAW --- p.26Chapter 3.6 --- Logic Optimization by using GBAW --- p.28Chapter 3.7 --- Conclusions --- p.31Chapter 4 --- Multi-way Partitioning using Rewiring Techniques --- p.33Chapter 4.1 --- Introduction --- p.33Chapter 4.2 --- Circuit Partitioning Algorithm Analysis --- p.38Chapter 4.2.1 --- The Kernighan-Lin (KL) Algorithm --- p.39Chapter 4.2.2 --- The Fiduccia-Mattheyses (FM) Algorithm --- p.42Chapter 4.2.3 --- Geometric Representation Algorithm --- p.46Chapter 4.2.4 --- The Multi-level Partitioning Algorithm --- p.49Chapter 4.2.5 --- Hypergraph METIS - hMETIS --- p.51Chapter 4.3 --- The GBAW Partitioning Algorithm --- p.53Chapter 4.4 --- Experimental Results --- p.56Chapter 4.5 --- Conclusions --- p.58Chapter 5 --- Optimum FPGA Switch-Box Designs - HUSB --- p.62Chapter 5.1 --- Introduction --- p.62Chapter 5.2 --- Background and Definitions --- p.65Chapter 5.2.1 --- Routing Architectures --- p.65Chapter 5.2.2 --- Global Routing --- p.67Chapter 5.2.3 --- Detailed Routing --- p.67Chapter 5.3 --- FPGA Router Comparison --- p.69Chapter 5.3.1 --- CGE --- p.69Chapter 5.3.2 --- SEGA --- p.70Chapter 5.3.3 --- TRACER --- p.71Chapter 5.3.4 --- VPR --- p.72Chapter 5.4 --- Switch Box Design --- p.73Chapter 5.4.1 --- Disjoint type switch box (XC4000-type) --- p.73Chapter 5.4.2 --- Anti-symmetric switch box --- p.74Chapter 5.4.3 --- Universal Switch box --- p.74Chapter 5.4.4 --- Switch box Analysis --- p.75Chapter 5.5 --- Terminology --- p.77Chapter 5.6 --- "Hyper-universal (4, W)-design analysis" --- p.82Chapter 5.6.1 --- "H3 is an optimum (4, 3)-design" --- p.84Chapter 5.6.2 --- "H4 is an optimum (4,4)-design" --- p.88Chapter 5.6.3 --- "Hi is a hyper-universal (4, i)-design for i = 5,6,7" --- p.90Chapter 5.7 --- Experimental Results --- p.92Chapter 5.8 --- Conclusions --- p.95Chapter 6 --- Conclusions --- p.99Chapter 6.1 --- Thesis Summary --- p.99Chapter 6.2 --- Future work --- p.100Chapter 6.2.1 --- Alternative Wiring --- p.100Chapter 6.2.2 --- Partitioning Quality --- p.100Chapter 6.2.3 --- Routing Devices Studies --- p.100Bibliography --- p.101Chapter A --- 5xpl - Berkeley Logic Interchange Format (BLIF) --- p.115Chapter B --- Proof of some 2-local patterns --- p.122Chapter C --- Illustrations of FM algorithm --- p.124Chapter D --- HUSB Structures --- p.127Chapter E --- Primitive minimal 4-way global routing Structures --- p.13

    Techniques de routage pseudo-aléatoire pour une application micro-électronique

    Get PDF
    Résumé La problématique de routage est très actuelle. On en trouve des applications dans les GPS, les prévisions de trafic routier, mais aussi pour le prototypage sur FPGA, la fabrication de puces électroniques ou le trafic TCP/IP sur Internet. On trouve des publications sur le sujet depuis plusieurs dizaines d'années, mais on observe actuellement une recrudescence confirmant l'actualité, l'importance et la complexité de ce problème. Cette thèse concerne le routage et ses ressources pour une application dans un nouveau type de système micro-électronique, nommé le WaferBoardTM . Son noyau consiste en un circuit électronique intégré à l'échelle d'une tranche de silicium (wafer). Peu d'applications commerciales de la micro-électronique ont exploité ce niveau d'intégration. Ce système de prototypage rapide vise à réduire d'un ou deux ordres de grandeur le temps de développement de systèmes électroniques. Il nécessite un ensemble d'outils logiciel de support, dont un outil de routage très rapide, capable de produire des solutions valables en des temps de l'ordre de la minute, et de certaines fonctionnalités spécifiques, l'équilibrage de délai ou le reroutage à la volée, au sein d'une netlist déjà routée. La problématique de routage pour cette application peut être imagée comme suit. Étant donné un réseau routier régulier (les routes d’Amériques du Nord en version cartésienne par exemple) et 100,000 voitures au départ lundi à 8h a.m. dans tout le pays avec des sources et destinations très variées; calculer les chemins pour toutes les voitures de telle sorte qu'aucune ne prenne la même route dans la journée. Il est 7h59 a.m, vous avez 1 minute, et des ponts sont inaccessibles pour travaux, en voici la liste. Cet exemple simpliste donne une idée des ordres de grandeurs de la problématique de routage que l'on cherche à résoudre pour cette application. Un algorithme de routage prend en paramètres deux structures de données : un graphe (ou réseau d'interconnexions) constitué de n\oe{}uds (sommets) et d'arcsUn arc relie deux sommets du graphe, et une netlistDans ce contexte, un netlist réfère à une liste d'interconnexions entre composants, liste de n\oe{}uds électriques dont les points de départ et d'arrivée sont positionnés géographiquement. Ainsi, au lieu de voitures, il s'agit de router des signaux électriques dont les points de départ et d'arrivée sont dictés par la position des broches des composants placés sur le système de prototypage. Un réseau régulier maillé mufti-dimensionnel (plus généralement appelé « réseau d'interconnexions ») sert de réseau routier dont certaines routes sont défectueuses, des ponts inaccessibles. En effet, le réseau d'interconnexions est un circuit électronique intégré à l'échelle d'une tranche de silicium complète, ce qui implique la présence de défectuosités au sein de chaque circuit fabriqué. Contrairement aux circuits électroniques classiques, où chacun est testé et les défectueux écartés, une intégration à l'échelle de la tranche demande de fortes redondances au sein du circuit pour minimiser le taux de rejets. Pour l'application du WaferBoard, un certain nombre d'éléments du réseau d'interconnexions seront fort probablement défectueux sur chaque circuit produit; l'algorithme de routage se doit de prendre en compte ces éléments très particuliers. Cette contrainte ne se retrouve pas dans les applications plus classiques des routeurs que l'on retrouve dans les PCB, circuits FPGA ou circuits VLSI. D'autres contraintes s'appliquent à ce projet particulier : la latence induite par la technologie est environ un ordre de grandeur plus importante que celle dans les circuits sur PCB, ce qui impose un routage orienté vers sa réduction.----------Abstract The routing problem is very actual. Applications are found in GPS, road traffic forecast, but also for prototyping on FPGA, or TCP/IP traffic on the Internet. Publications on the subject have existed for several decades, but new publications keep appearing, confirming the importance and complexity of the problem. This thesis deals with routing and the resources it requires for a new category of micro-electronic applications, called the WaferBoard. It is an electronic circuit integrated at the wafer scale. Few commercial applications of micro-electronics have exploited this level of integration. This rapid prototyping system aims at reducing by one or two orders of magnitude the development time of digital circuits. It requires a very fast routing tool, capable of producing viable solutions in a few minutes, with dedicated functionality such as balancing delays and rerouting on the fly parts of a netlist. The routing problem for this application can be pictured as follows. Given a regular road network of the size of north america, if 100.000 cars were to start Monday 8 a.m. across the continent with a wide variety of sources and destinations; the challenge is to compute paths for all cars so none of them take the same route that day. It is 7:59 am, you have 1 minute, and some bridges are under road work: here is the list. This simplistic example gives an idea of the orders of magnitude of the problem that need to be solved for this application. A routing algorithm takes as input: a graph (or interconnection network) made of nodes and edges, and a netlst, a list of electrical nodes with starting and ending points physically placed. Therefore, instead of cars, the problem consists of routing electrical signals with points of departure and arrival dictated by the pin position of components placed on the prototyping system. A regular, multi-dimensional mesh (also called "interconnection network") serves as a road network, which contains defective roads and inaccessible bridges. Indeed, the interconnection network is an electronic circuit integrated across a full wafer, implying the presence of defects within each manufactured circuit. Unlike conventional electronic circuits, where each is tested and defective ones are set apart, wafer scale integrated applications require lots of redundancy in the circuit to minimize the rejection rate. In the WaferBoard, a number of elements of the interconnection network will be defective in each circuit; the routing algorithm must take into account these very specific elements. This constraint is not found in the classic applications of routers found in PCB, FPGA or VLSI circuits. Other restrictions apply to this particular project: the latency induced by the technology is about one order of magnitude greater than that in the circuits of PCBs, which requires a routing oriented towards computation time reduction. This constraint partly explains the network architecture used. Within the WaferIC, the shortest distance is not necessarily the one that offers the smallest latency. This property of the network complexifies the routing problem. Balancing delays within a group of arbitrary size nets is a necessary feature of the routing algorithm, and the difficulty is amplified by the computation time limit. Indeed, the interest of the application is to reduce the time for a user to test a circuit: the time of setup is extremely short, and estimated at a few minutes only

    Power Management for Deep Submicron Microprocessors

    Get PDF
    As VLSI technology scales, the enhanced performance of smaller transistors comes at the expense of increased power consumption. In addition to the dynamic power consumed by the circuits there is a tremendous increase in the leakage power consumption which is further exacerbated by the increasing operating temperatures. The total power consumption of modern processors is distributed between the processor core, memory and interconnects. In this research two novel power management techniques are presented targeting the functional units and the global interconnects. First, since most leakage control schemes for processor functional units are based on circuit level techniques, such schemes inherently lack information about the operational profile of higher-level components of the system. This is a barrier to the pivotal task of predicting standby time. Without this prediction, it is extremely difficult to assess the value of any leakage control scheme. Consequently, a methodology that can predict the standby time is highly beneficial in bridging the gap between the information available at the application level and the circuit implementations. In this work, a novel Dynamic Sleep Signal Generator (DSSG) is presented. It utilizes the usage traces extracted from cycle accurate simulations of benchmark programs to predict the long standby periods associated with the various functional units. The DSSG bases its decisions on the current and previous standby state of the functional units to accurately predict the length of the next standby period. The DSSG presents an alternative to Static Sleep Signal Generation (SSSG) based on static counters that trigger the generation of the sleep signal when the functional units idle for a prespecified number of cycles. The test results of the DSSG are obtained by the use of a modified RISC superscalar processor, implemented by SimpleScalar, the most widely accepted open source vehicle for architectural analysis. In addition, the results are further verified by a Simultaneous Multithreading simulator implemented by SMTSIM. Leakage saving results shows an increase of up to 146% in leakage savings using the DSSG versus the SSSG, with an accuracy of 60-80% for predicting long standby periods. Second, chip designers in their effort to achieve timing closure, have focused on achieving the lowest possible interconnect delay through buffer insertion and routing techniques. This approach, though, taxes the power budget of modern ICs, especially those intended for wireless applications. Also, in order to achieve more functionality, die sizes are constantly increasing. This trend is leading to an increase in the average global interconnect length which, in turn, requires more buffers to achieve timing closure. Unconstrained buffering is bound to adversely affect the overall chip performance, if the power consumption is added as a major performance metric. In fact, the number of global interconnect buffers is expected to reach hundreds of thousands to achieve an appropriate timing closure. To mitigate the impact of the power consumed by the interconnect buffers, a power-efficient multi-pin routing technique is proposed in this research. The problem is based on a graph representation of the routing possibilities, including buffer insertion and identifying the least power path between the interconnect source and set of sinks. The novel multi-pin routing technique is tested by applying it to the ISPD and IBM benchmarks to verify the accuracy, complexity, and solution quality. Results obtained indicate that an average power savings as high as 32% for the 130-nm technology is achieved with no impact on the maximum chip frequency

    Initial detailed routing algorithms

    Get PDF
    In this work, we present a study of the problem of routing in the context of the VLSI physical synthesis flow. We study the fundamental routing algorithms such as maze routing, A*, and Steiner tree-based algorithms, as well as some global routing algorithms, namely FastRoute 4.0 and BoxRouter 2.0. We dissect some of the major state of the art initial detailed routing tools, such as RegularRoute, TritonRoute, SmartDR and Dr.CU 2.0. We also propose an initial detailed routing flow, and present an implementation of the proposed routing flow, with a track assignment technique that models the problem as an instance of the maximum independent weighted set (MWIS) and utilizes integer linear programming (ILP) as a solver. The implementation of the proposed initial detailed routing flow also includes an implementation of multiple-source and multiple-target A* for terminal andnet connection with adjustable rules and weights. Finally, we also present a study of the results obtained by the implementation of the proposed initial detailed routing flow and a comparison with the ISPD 2019 contest winners, considering the ISPD 2019 and benchmark suite and evaluation tools.Neste trabalho, apresentamos um estudo do problema de roteamento no contexto do fluxo de síntese física de circuitos integrados VLSI. Nós estudamos algoritmos de roteamento fundamentais como roteamento de labirinto, A* e baseados em árvores de Steiner, além de alguns algoritmos de roteamento global como FastRoute 4.0 e BoxRouter 2.0. Nós dissecamos alguns dos principais trabalhos de roteamento detalhado inicial do estado da arte, como RegularRoute, TritonRoute, SmartDR e Dr.CU 2.0. Também propomos um fluxo de roteamento detalhado inicial, e apresentamos uma implementação do fluxo de roteametno proposto, com uma técnica de assinalamento de trilhas que modela o problema como uma instância do problema do conjunto independente de peso máximo e usa programação linear inteira como um resolvedor. A implementação do fluxo de rotemaento detalhado inicial proposto também inclui uma implementação de um A* com múltiplas fontes e múltiplos destinos para conexão de terminais e redes, com regras e pesos ajustáveis. Por fim, nós apresentamos um estudo dos resultados obtidos pela implementação do fluxo de roteamento detalhado inicial proposto e comparamos com os vencedores do ISPD 2019 contest considerando a suíte de teste e ferramentas de avaliação do ISPD 2019
    • …
    corecore