98,489 research outputs found

    Transistor-Level Synthesis of Pipeline Analog-to-Digital Converters Using a Design-Space Reduction Algorithm

    Get PDF
    A novel transistor-level synthesis procedure for pipeline ADCs is presented. This procedure is able to directly map high-level converter specifications onto transistor sizes and biasing conditions. It is based on the combination of behavioral models for performance evaluation, optimization routines to minimize the power and area consumption of the circuit solution, and an algorithm to efficiently constraint the converter design space. This algorithm precludes the cost of lengthy bottom-up verifications and speeds up the synthesis task. The approach is herein demonstrated via the design of a 0.13 μm CMOS 10 bits@60 MS/s pipeline ADC with energy consumption per conversion of only 0.54 pJ@1 MHz, making it one of the most energy-efficient 10-bit video-rate pipeline ADCs reported to date. The computational cost of this design is of only 25 min of CPU time, and includes the evaluation of 13 different pipeline architectures potentially feasible for the targeted specifications. The optimum design derived from the synthesis procedure has been fine tuned to support PVT variations, laid out together with other auxiliary blocks, and fabricated. The experimental results show a power consumption of 23 [email protected] V and an effective resolution of 9.47-bit@1 MHz. Bearing in mind that no specific power reduction strategy has been applied; the mentioned results confirm the reliability of the proposed approach.Ministerio de Ciencia e Innovación TEC2009-08447Junta de Andalucía TIC-0281

    Exact and heuristic allocation of multi-kernel applications to multi-FPGA platforms

    Get PDF
    FPGA-based accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of high-performance multi-kernel applications, like Convolutional Neural Networks, to multi-FPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.Peer ReviewedPostprint (author's final draft

    Electrical-level synthesis of pipeline ADCs

    Get PDF
    This paper presents a design tool for the synthesis of pipeline ADCs which is able to optimally map high-level converter specifications, such as the required effective resolution, onto electrical-level parameters, i.e., transistor sizes and biasing conditions. It is based on the combination of a behavioural simulator for performance evaluation, accurate models of the converter components, and an optimization algorithm to minimize the power and area consumption of the circuit solution. The design procedure is herein demonstrated with the complete design of a 0.13 mum CMOS 10 bits@60MS/s pipeline ADC, which only consumes 11.3 mW from a 1.2 V supply voltage. A close agreement between behavioural- and electrical-level simulations is obtained with only 0.2 bit deviation on the measured ENOB.Ministerio de Educación y Ciencia TEC2006-03022Junta de Andalucía TIC-0281

    Simulation-based high-level synthesis of Nyquist-rate data converters using MATLAB/SIMULINK

    Get PDF
    This paper presents a toolbox for the simulation, optimization and high-level synthesis of Nyquist-rate Analog-to-Digital (A/D) and Digital-to-Analog (D/A) Converters in MATLAB®. The embedded simulator uses SIMULINK® C-coded S-functions to model all required subcircuits including their main error mechanisms. This approach allows to drastically speed up the simulation CPU-time up to 2 orders of magnitude as compared with previous approaches - based on the use of SIMULINK® elementary blocks. Moreover, S-functions are more suitable for implementing a more detailed description of the circuit. For all subcircuits, the accuracy of the behavioral models has been verified by electrical simulation using HSPICE. For synthesis purposes, the simulator is used for performance evaluation and combined with an hybrid optimizer for design parameter selection. The optimizer combines adaptive statistical optimization algorithm inspired in simulated annealing with a design-oriented formulation of the cost function. It has been integrated in the MATLAB/SIMULINK® platform by using the MATLAB® engine library, so that the optimization core runs in background while MATLAB® acts as a computation engine. The implementation on the MATLAB® platform brings numerous advantages in terms of signal processing, high flexibility for tool expansion and simulation with other electronic subsystems. Additionally, the presented toolbox comprises a friendly graphical user interface to allow the designer to browse through all steps of the simulation, synthesis and post-processing of results. In order to illustrate the capabilities of the toolbox, a 0.13)im CMOS 12bit@80MS/s analog front-end for broadband power line communications, made up of a pipeline ADC and a current steering DAC, is synthesized and high-level sized. Different experiments show the effectiveness of the proposed methodology.Ministerio de Ciencia y Tecnología TIC2003-02355RAICONI

    A hierarchical mathematical model for automatic pipelining and allocation using elastic systems

    Get PDF
    The advent of FPGA-based accelerators has encouraged the use of high-level synthesis (HLS) for rapid prototyping and design space exploration. In this context, design optimization at behavioral level becomes a critical task for the delivery of high-quality solutions. Time elasticity opens a new avenue of optimizations that can be applied after HLS and before logic synthesis, proposing new sequential transformations that expand beyond classical retiming and enlarge the register-transfer level (RTL) exploration space. This paper proposes a mathematical model for RTL transformations that exploit elasticity to select the best implementation for each functional unit and add pipeline registers to increase performance. Two simple examples are used to validate the effectiveness and potential benefits of the model.Peer ReviewedPostprint (author's final draft

    Using fast and accurate simulation to explore hardware/software trade-offs in the multi-core era

    Get PDF
    Writing well-performing parallel programs is challenging in the multi-core processor era. In addition to achieving good per-thread performance, which in itself is a balancing act between instruction-level parallelism, pipeline effects and good memory performance, multi-threaded programs complicate matters even further. These programs require synchronization, and are affected by the interactions between threads through sharing of both processor resources and the cache hierarchy. At the Intel Exascience Lab, we are developing an architectural simulator called Sniper for simulating future exascale-era multi-core processors. Its goal is twofold: Sniper should assist hardware designers to make design decisions, while simultaneously providing software designers with a tool to gain insight into the behavior of their algorithms and allow for optimization. By taking architectural features into account, our simulator can provide more insight into parallel programs than what can be obtained from existing performance analysis tools. This unique combination of hardware simulator and software performance analysis tool makes Sniper a useful tool for a simultaneous exploration of the hardware and software design space for future high-performance multi-core systems

    Automated CNN pipeline generation for heterogeneous architectures

    Get PDF
    Heterogeneity is a vital feature in emerging processor chip designing. Asymmetric multicore-clusters such as high-performance cluster and power efficient cluster are common in modern edge devices. One example is Intel\u27s Alder Lake featuring Golden Cove high-performance cores and Gracemont power-efficient cores. Chiplet-based technology allows organization of multi cores in form of multi-chip-modules, thus housing large number of cores in a processor. Interposer based packaging has enabled embedding High Bandwidth Memory (HBM) on chip and reduced transmission latency and energy consumption of chiplet-chiplet interconnect.\ua0For Instance Intel\u27s XeHPC Ponte Vecchio package integrates multi-chip GPU organization along with HBM modules.Since new devices feature heterogeneity at the level of cores, memory and on-chip interconnect, it has become important to steer optimization at application level in order to leverage the new heterogeneous, high-performing and power-efficient features of underlying computing platforms. An important high-performance application paradigm is Convolution Neural Networks (CNN). CNNs are widely used in many practical applications. The pipelined parallel implementation of CNN is favored for inference on edge devices. In this Licentiate thesis we present a novel scheme for automatic scheduling of CNN pipelines on heterogeneous devices. A pipeline schedule is a configuration that provides information on depth of pipeline, grouping of CNN layers into pipeline stages and mapping of pipeline stages onto computing units. We utilize simple compile-time hints which consists of workload information of individual CNN layers and performance hints of computing units.The proposed approach provides near optimal solution for a throughput maximizing pipeline. We model the problem as a design space exploration technique. We developed a time-efficient design space navigation through heuristics extracted from the knowledge of CNN structure and underlying computing platform. The proposed search scheme converges faster and utilizes real-time performance measurements as fitness values. The results demonstrate that the proposed scheme converges faster and can scale when used with larger networks and computing platforms. Since the scheme utilizes online performance measurements, one of the challenges is to avoid expensive configurations during online tuning. The results demonstrate that on average, ~80\% of the tested configurations are sub-optimal solutions.Another challenge is to reduce convergence time. The experiments show that proposed approach is 35x faster than stochastic optimization algorithms. Since the design space is large and complex, We show that the proposed scheme explores only ~0.1% of the total design space in case of large CNNs (having 50+ layers) and results in near-optimal solution

    A framework for automatically generating optimized digital designs from C-language loops

    Get PDF
    Reconfigurable computing has the potential for providing significant performance increases to a number of computing applications. However, realizing these benefits requires digital design experience and knowledge of hardware description languages (HDLs). While a number of tools have focused on translation of high-level languages (HLLs) to HDLs, the tools do not always create optimized digital designs that are competitive with hand-coded solutions. This work describes an automatic optimization in the C-to-HDL transformation that reorganizes operations between pipeline stages in order to reduce critical path lengths. The effects of this optimization are examined on the MD5, SHA-1, and Smith-Waterman algorithms. Results show that the optimization results in performance gains of 13%-37% and that the automatically-generated implementations perform comparably to hand-coded implementations
    corecore