231 research outputs found

    OPTIMIZING LARGE COMBINATIONAL NETWORKS FOR K-LUT BASED FPGA MAPPING

    Get PDF
    Optimizing by partitioning is a central problem in VLSI design automation, addressing circuit’s manufacturability. Circuit partitioning has multiple applications in VLSI design. One of the most common is that of dividing combinational circuits (usually large ones) that will not fit on a single package among a number of packages. Partitioning is of practical importance for k-LUT based FPGA circuit implementation. In this work is presented multilevel a multi-resource partitioning algorithm for partitioning large combinational circuits in order to efficiently use existing and commercially available FPGAs packagestwo-way partitioning, multi-way partitioning, recursive partitioning, flat partitioning, critical path, cutting cones, bottom-up clusters, top-down min-cut

    Timing Aware Partitioning for Multi-FPGA based Logic Simulation using Top-down Selective Flattening

    Get PDF
    In order to accelerate logic simulation, it is highly beneficial to simulate the circuit design on FPGA hardware. However, limited hardware resources on FPGAs prevent large designs from being implemented on a single FPGA. Hence there is a need to partition the design and simulate it on a multi-FPGA platform. In contrast to existing FPGA-based post-synthesis partitioning approaches which first completely flatten the circuit and then possibly perform bottom-up clustering, we perform a selective top-down flattening and thereby avoid the potential netlist blowup. This also allows us to preserve the design hierarchy to guide the partitioning and to make subsequent debugging easier. Our approach analyzes the hierarchical design and selectively flattens instances using two metrics based on slack. The resulting partially flattened netlist is converted to a hypergraph, partitioned using a public domain partitioner (hMetis), and reconverted back to a plurality of FPGA netlists, one for each FPGA of the FPGA-based accelerated logic simulation platform. We compare our approach with a partitioning approach that operates on a completely flattened netlist. Static timing analysis was performed for both approaches, and over 15 examples from the OpenCores project, our approach yields a 52% logic simulation speedup and about 0.74x runtime for the entire flow, compared to the completely flat approach. The entire tool chain of our approach is automated in an end-to-end flow from hierarchy extraction, selective flattening, partitioning, and netlist reconstruction. Compared to an existing method which also performs slack-based partitioning of a hierarchical netlist, we obtain a 35% simulation speedup

    Advances in Functional Decomposition: Theory and Applications

    Get PDF
    Functional decomposition aims at finding efficient representations for Boolean functions. It is used in many applications, including multi-level logic synthesis, formal verification, and testing. This dissertation presents novel heuristic algorithms for functional decomposition. These algorithms take advantage of suitable representations of the Boolean functions in order to be efficient. The first two algorithms compute simple-disjoint and disjoint-support decompositions. They are based on representing the target function by a Reduced Ordered Binary Decision Diagram (BDD). Unlike other BDD-based algorithms, the presented ones can deal with larger target functions and produce more decompositions without requiring expensive manipulations of the representation, particularly BDD reordering. The third algorithm also finds disjoint-support decompositions, but it is based on a technique which integrates circuit graph analysis and BDD-based decomposition. The combination of the two approaches results in an algorithm which is more robust than a purely BDD-based one, and that improves both the quality of the results and the running time. The fourth algorithm uses circuit graph analysis to obtain non-disjoint decompositions. We show that the problem of computing non-disjoint decompositions can be reduced to the problem of computing multiple-vertex dominators. We also prove that multiple-vertex dominators can be found in polynomial time. This result is important because there is no known polynomial time algorithm for computing all non-disjoint decompositions of a Boolean function. The fifth algorithm provides an efficient means to decompose a function at the circuit graph level, by using information derived from a BDD representation. This is done without the expensive circuit re-synthesis normally associated with BDD-based decomposition approaches. Finally we present two publications that resulted from the many detours we have taken along the winding path of our research

    ASIC Technology Migrations: A Design Guide for First Pass Success

    Get PDF
    This thesis presents a study of Application Specific Integrated Circuit (ASIC) technology migrations. An overview of the design flow methodology used for completing a ASIC design from concept to silicon is presented. The design flow is then augmented with special considerations specifically for ASIC technology migrations. An ASIC technology migration design example, using the special considerations, is preseted. Finally, a summary is presented with considerations regarding future work

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    High integrity hardware-software codesign

    Get PDF
    Programmable logic devices (PLDs) are increasing in complexity and speed, and are being used as important components in safety-critical systems. Methods for developing high-integrity software for these systems are well-known, but this is not true for programmable logic. We propose a process for developing a system incorporating software and PLDs, suitable for safety critical systems of the highest levels of integrity. This process incorporates the use of Synchronous Receptive Process Theory as a semantic basis for specifying and proving properties of programs executing on PLDs, and extends the use of SPARK Ada from a programming language for safety-critical systems software to cover the interface between software and programmable logic. We have validated this approach through the specification and development of a substantial safety-critical system incorporating both software and programmable logic components, and the development of tools to support this work. This enables us to claim that the methods demonstrated are not only feasible but also scale up to realistic system sizes, allowing development of such safety-critical software-hardware systems to the levels required by current system safety standards

    Control Software for Reconfigurable Coprocessors

    Get PDF
    On-line data processing at the ATLAS general purpose particle detector, which is currently under construction at Geneva, generates demands on computing power that are difficult to satisfy with commodity CPU-based computers. One of the most demanding applications is the recognition of particle tracks that originate from B-quark decays. However, this and many others applications can benefit from parallel execution on field programmable gate arrays (FPGA). After the demonstration of accelerated track recognition with big FPGA-based custom computers, the development of FPGA based coprocessors started in the late 1990's. Applications of FPGA coprocessors are usually partitioned between the host and the tightly coupled coprocessor. The objective of the research that I present in this thesis was the development of software that mediates to applications the access to FPGA coprocessors. I used a software process based on iterative prototyping to cope with the expected changing requirements. Also, I used a strict bottom-up design to create classes that model devices on the coprocessors. Using these low-level classes, I developed tools which were used for bootstrapping, debugging, and firmware update of the coprocessors during their development and maintenance. Measurements show that the software overhead introduced by object-oriented programming and software layering is small. The software-support for six different coprocessors was partitioned into corresponding independent packages, which reuse a set of packages that provide common and basic functions. The steady evolution and use of the software during more than four years shows that the software is maintainable, adaptable, and usable

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces
    • …
    corecore