3,203 research outputs found

    A design flow for performance planning : new paradigms for iteration free synthesis

    Get PDF
    In conventional design, higher levels of synthesis produce a netlist, from which layout synthesis builds a mask specification for manufacturing. Timing anal ysis is built into a feedback loop to detect timing violations which are then used to update specifications to synthesis. Such iteration is undesirable, and for very high performance designs, infeasible. The problem is likely to become much worse with future generations of technology. To achieve a non-iterative design flow, early synthesis stages should use wire planning to distribute delays over the functional elements and interconnect, and layout synthesis should use its degrees of freedom to realize those delays

    Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor

    Get PDF
    Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low

    High-performance and Low-power Clock Network Synthesis in the Presence of Variation.

    Full text link
    Semiconductor technology scaling requires continuous evolution of all aspects of physical design of integrated circuits. Among the major design steps, clock-network synthesis has been greatly affected by technology scaling, rendering existing methodologies inadequate. Clock routing was previously sufficient for smaller ICs, but design difficulty and structural complexity have greatly increased as interconnect delay and clock frequency increased in the 1990s. Since a clock network directly influences IC performance and often consumes a substantial portion of total power, both academia and industry developed synthesis methodologies to achieve low skew, low power and robustness from PVT variations. Nevertheless, clock network synthesis under tight constraints is currently the least automated step in physical design and requires significant manual intervention, undermining turn-around-time. The need for multi-objective optimization over a large parameter space and the increasing impact of process variation make clock network synthesis particularly challenging. Our work identifies new objectives, constraints and concerns in the clock-network synthesis for systems-on-chips and microprocessors. To address them, we generate novel clock-network structures and propose changes in traditional physical-design flows. We develop new modeling techniques and algorithms for clock power optimization subject to tight skew constraints in the presence of process variations. In particular, we offer SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below 5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we propose new techniques and a methodology to reduce dynamic power consumption by 6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis within global placement. We also present a novel non-tree topology that is 2.3x more power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy in a clock network to bridge the gap between tree-like and mesh-like topologies. Integrated optimization techniques for high-quality clock networks described in this dissertation strong empirical results in experiments with recent industry-released benchmarks in the presence of process variation. Our software implementations were recognized with the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd

    The S2 VLBI Correlator: A Correlator for Space VLBI and Geodetic Signal Processing

    Get PDF
    We describe the design of a correlator system for ground and space-based VLBI. The correlator contains unique signal processing functions: flexible LO frequency switching for bandwidth synthesis; 1 ms dump intervals, multi-rate digital signal-processing techniques to allow correlation of signals at different sample rates; and a digital filter for very high resolution cross-power spectra. It also includes autocorrelation, tone extraction, pulsar gating, signal-statistics accumulation.Comment: 44 pages, 13 figure

    Accelerating FPGA routing through algorithmic enhancements and connection-aware parallelization

    No full text
    Routing is a crucial step in Field Programmable Gate Array (FPGA) physical design, as it determines the routes of signals in the circuit, which impacts the design implementation quality significantly. It can be very time-consuming to successfully route all the signals of large circuits that utilize many FPGA resources. Attempts have been made to shorten the routing runtime for efficient design exploration while expecting high-quality implementations. In this work, we elaborate on the connection-based routing strategy and algorithmic enhancements to improve the serial FPGA routing. We also explore a recursive partitioning-based parallelization technique to further accelerate the routing process. To exploit more parallelism by a finer granularity in both spatial partitioning and routing, a connection-aware routing bounding box model is proposed for the source-sink connections of the nets. It is built upon the location information of each connection’s source, sink, and the geometric center of the net that the connection belongs to, different from the existing net-based routing bounding box that covers all the pins of the entire net. We present that the proposed connection-aware routing bounding box is more beneficial for parallel routing than the existing net-based routing bounding box. The quality and runtime of the serial and multi-threaded routers are compared to the router in VPR 7.0.7. The large heterogeneous Titan23 designs that are targeted to a detailed representation of the Stratix IV FPGA are used for benchmarking. With eight threads, the parallel router using the connection-aware routing bounding box model reaches a speedup of 6.1× over the serial router in VPR 7.0.7, which is 1.24× faster than the one using the existing net-based routing bounding box model, while reducing the total wire-length by 10% and the critical path delay by 7%

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    From FPGA to ASIC: A RISC-V processor experience

    Get PDF
    This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC
    corecore