570 research outputs found

    Orthogonal Greedy Coupling - A New Optimization Approach to 2-D FPGA Routing

    Full text link

    Lexicographic path searches for FPGA routing

    Full text link
    This dissertation reports on studies of the application of lexicographic graph searches to solve problems in FPGA detailed routing. Our contributions include the derivation of iteration limits for scalar implementations of negotiation congestion for standard floating point types and the identification of pathological cases for path choice. In the study of the routability-driven detailed FPGA routing problem, we show universal detailed routability is NP-complete based on a related proof by Lee and Wong. We describe the design of a lexicographic composition operator of totally-ordered monoids as path cost metrics and show its optimality under an adapted A* search. Our new router, CornNC, based on lexicographic composition of congestion and wirelength, established a new minimum track count for the FPGA Place and Route Challenge. For the problem of long-path timing-driven FPGA detailed routing, we show that long-path budgeted detailed routability is NP-complete by reduction to universal detailed routability. We generalise the lexicographic composition to any finite length and verify its optimality under A* search. The application of the timing budget solution of Ghiasi et al. is used to solve the long-path timing budget problem for FPGA connections. Our delay-clamped spiral lexicographic composition design, SpiralRoute, ensures connection based budgets are always met, thus achieves timing closure when it successfully routes. For 113 test routing instances derived from standard benchmarks, SpiralRoute found 13 routable instances with timing closure that were unroutable by a scalar negotiated congestion router and achieved timing closure in another 27 cases when the scalar router did not, at the expense of increased runtime. We also study techniques to improve SpiralRoute runtimes, including a data structure of a trie augmented by data stacks for minimum element retrieval, and the technique of step tomonoid elimination in reducing the retrieval depth in a trie of stacks structure

    Some results on FPGAs, file transfers, and factorizations of graphs.

    Get PDF
    by Pan Jiao Feng.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 89-93).Abstract also in Chinese.Abstract --- p.iAcknowledgments --- p.vList of Tables --- p.xList of Figures --- p.xiChapter Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Graph definitions --- p.2Chapter 1.2 --- The S box graph --- p.2Chapter 1.3 --- The file transfer graph --- p.4Chapter 1.4 --- "(g, f)-factor and (g, f)-factorization" --- p.5Chapter 1.5 --- Thesis contributions --- p.6Chapter 1.6 --- Organization of the thesis --- p.7Chapter Chapter 2. --- On the Optimal Four-way Switch Box Routing Structures of FPGA Greedy Routing Architectures --- p.8Chapter 2.1 --- Introduction --- p.9Chapter 2.1.1 --- FPGA model and S box model --- p.9Chapter 2.1.2 --- FPGA routing --- p.10Chapter 2.1.3 --- Problem formulation --- p.10Chapter 2.2 --- Definitions and terminology --- p.12Chapter 2.2.1 --- General terminology --- p.12Chapter 2.2.2 --- Graph definitions --- p.15Chapter 2.2.3 --- The S box graph --- p.15Chapter 2.3 --- Properties of the S box graph and side-to-side graphs --- p.16Chapter 2.3.1 --- On the properties of the S box graph --- p.16Chapter 2.3.2 --- The properties of side-to-side graphs --- p.19Chapter 2.4 --- Conversion of the four-way FPGA routing problem --- p.23Chapter 2.4.1 --- Conversion of the S box model --- p.24Chapter 2.4.2 --- Conversion of the DAAA model --- p.26Chapter 2.4.3 --- Conversion of the DADA model --- p.27Chapter 2.4.4 --- Conversion of the DDDA model --- p.28Chapter 2.5 --- Low bounds of routing switches --- p.28Chapter 2.5.1 --- The lower bound of the DAAA model --- p.29Chapter 2.5.2 --- The lower bound of the DADA model --- p.30Chapter 2.5.3 --- The lower bound of the DDDA model --- p.31Chapter 2.6 --- Optimal structure of one-side predetermined four-way FPGA routing --- p.32Chapter 2.7 --- Optimal structures of two-side and three-side predetermined four-way FPGA routing --- p.45Chapter 2.7.1 --- Optimal structure of two-side predetermined four-way FPGA routing --- p.46Chapter 2.7.2 --- Optimal structure of three-side predetermined four-way FPGA routing --- p.47Chapter 2.8 --- Conclusion --- p.49Appendix --- p.50Chapter Chapter 3. --- "Application of (0, f)-Factorization on the Scheduling of File Transfers" --- p.53Chapter 3.1 --- Introduction --- p.53Chapter 3.1.1 --- "(0,f)-factorization" --- p.54Chapter 3.1.2 --- File transfer model and its graph --- p.54Chapter 3.1.3 --- Previous results --- p.56Chapter 3.1.4 --- Our results and outline of the chapter --- p.56Chapter 3.2 --- NP-completeness --- p.57Chapter 3.3 --- Some lemmas --- p.58Chapter 3.4 --- Bounds of file transfer graphs --- p.59Chapter 3.5 --- Comparison --- p.62Chapter 3.6 --- Conclusion --- p.68Chapter Chapter 4. --- "Decomposition Graphs into (g,f)-Factors" --- p.69Chapter 4.1 --- Introduction --- p.69Chapter 4.1.1 --- "(g,f)-factors and (g,f)-factorizations" --- p.69Chapter 4.1.2 --- Previous work --- p.70Chapter 4.1.3 --- Our results --- p.72Chapter 4.2 --- Proof of Theorem 2 --- p.73Chapter 4.3 --- Proof of Theorem 3 --- p.79Chapter 4.4 --- Proof of Theorem 4 --- p.80Chapter 4.5 --- Related previous results --- p.82Chapter 4.6 --- Conclusion --- p.84Chapter Chapter 5. --- Conclusion --- p.85Chapter 5.1 --- About graph-based approaches --- p.85Chapter 5.2 --- FPGA routing --- p.87Chapter 5.3 --- The scheduling of file transfer --- p.88Bibliography --- p.89Vita --- p.9

    A Comprehensive Workflow for General-Purpose Neural Modeling with Highly Configurable Neuromorphic Hardware Systems

    Full text link
    In this paper we present a methodological framework that meets novel requirements emerging from upcoming types of accelerated and highly configurable neuromorphic hardware systems. We describe in detail a device with 45 million programmable and dynamic synapses that is currently under development, and we sketch the conceptual challenges that arise from taking this platform into operation. More specifically, we aim at the establishment of this neuromorphic system as a flexible and neuroscientifically valuable modeling tool that can be used by non-hardware-experts. We consider various functional aspects to be crucial for this purpose, and we introduce a consistent workflow with detailed descriptions of all involved modules that implement the suggested steps: The integration of the hardware interface into the simulator-independent model description language PyNN; a fully automated translation between the PyNN domain and appropriate hardware configurations; an executable specification of the future neuromorphic system that can be seamlessly integrated into this biology-to-hardware mapping process as a test bench for all software layers and possible hardware design modifications; an evaluation scheme that deploys models from a dedicated benchmark library, compares the results generated by virtual or prototype hardware devices with reference software simulations and analyzes the differences. The integration of these components into one hardware-software workflow provides an ecosystem for ongoing preparative studies that support the hardware design process and represents the basis for the maturity of the model-to-hardware mapping software. The functionality and flexibility of the latter is proven with a variety of experimental results

    High-Performance Architecture for Binary-Tree-Based Finite State Machines

    Get PDF
    A binary-tree-based finite state machine (BT-FSM) is a state machine with a 1-bit input signal whose state transition graph is a binary tree. BT-FSMs are useful in those application areas where searching in a binary tree is required, such as computer networks, compression, automatic control, or cryptography. This paper presents a new architecture for implementing BT-FSMs which is based on the model finite virtual state machine (FVSM). The proposed architecture has been compared with the general FVSM and conventional approaches by using both synthetic test benches and very large BT-FSMs obtained from a real application. In synthetic test benches, the average speed improvement of the proposed architecture respect to the best results of the other approaches achieves 41% (there are some cases in which the speed is more than double). In the case of the real application, the average speed improvement achieves 155%

    Statistical approach to NoC design

    Get PDF
    Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single chip. These cores run several tasks with unpredictable communication needs, resulting in uncertain and often-changing traffic patterns. This unpredictability leads network-on-chip (NoC) designers to plan for the worst-case traffic patterns, and significantly over-provision link capacities. In this paper, we provide NoC designers with an alternative statistical approach. We first present the traffic-load distribution plots (T-Plots), illustrating how much capacity over-provisioning is needed to service 90%, 99%, or 100% of all traffic patterns. We prove that in the general case, plotting T-Plots is #P-complete, and therefore extremely complex. We then show how to determine the exact mean and variance of the traffic load on any edge, and use these to provide Gaussian-based models for the T-Plots, as well as guaranteed performance bounds. Finally, we use T-Plots to reduce the network power consumption by providing an efficient capacity allocation algorithm with predictable performance guarantees. © 2008 IEEE

    Statistical approach to networks-on-chip

    Get PDF
    Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single chip. These cores run several tasks with unpredictable communication needs, resulting in uncertain and often-changing traffic patterns. This unpredictability leads network-on-chip (NoC) designers to plan for the worst case traffic patterns, and significantly overprovision link capacities. In this paper, we provide NoC designers with an alternative statistical approach. We first present the traffic-load distribution plots (T-Plots), illustrating how much capacity overprovisioning is needed to service 90, 99, or 100 percent of all traffic patterns. We prove that in the general case, plotting T-Plots is #P-complete, and therefore extremely complex. We then show how to determine the exact mean and variance of the traffic load on any edge, and use these to provide Gaussian-based models for the T-Plots, as well as guaranteed performance bounds. We also explain how to practically approximate T-Plots using random-walk-based methods. Finally, we use T-Plots to reduce the network power consumption by providing an efficient capacity allocation algorithm with predictable performance guarantees. © 2006 IEEE

    A Modular Approach to Adaptive Reactive Streaming Systems

    Get PDF
    The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects
    • …
    corecore