99,646 research outputs found

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    Getting a Grip on the Transverse Motion in a Zeeman Decelerator

    Full text link
    Zeeman deceleration is an experimental technique in which inhomogeneous, time-dependent magnetic fields generated inside an array of solenoid coils are used to manipulate the velocity of a supersonic beam. A 12-stage Zeeman decelerator has been built and characterized using hydrogen atoms as a test system. The instrument has several original features including the possibility to replace each deceleration coil individually. In this article, we give a detailed description of the experimental setup, and illustrate its performance. We demonstrate that the overall acceptance in a Zeeman decelerator can be significantly increased with only minor changes to the setup itself. This is achieved by applying a rather low, anti-parallel magnetic field in one of the solenoid coils that forms a temporally varying quadrupole field, and improves particle confinement in the transverse direction. The results are reproduced by three-dimensional numerical particle trajectory simulations thus allowing for a rigorous analysis of the experimental data. The findings suggest the use of a modified coil configuration to improve transverse focusing during the deceleration process.Comment: accepted by J. Chem. Phy

    Efficient Quantum Pseudorandomness

    Get PDF
    Randomness is both a useful way to model natural systems and a useful tool for engineered systems, e.g. in computation, communication and control. Fully random transformations require exponential time for either classical or quantum systems, but in many case pseudorandom operations can emulate certain properties of truly random ones. Indeed in the classical realm there is by now a well-developed theory of such pseudorandom operations. However the construction of such objects turns out to be much harder in the quantum case. Here we show that random quantum circuits are a powerful source of quantum pseudorandomness. This gives the for the first time a polynomialtime construction of quantum unitary designs, which can replace fully random operations in most applications, and shows that generic quantum dynamics cannot be distinguished from truly random processes. We discuss applications of our result to quantum information science, cryptography and to understanding self-equilibration of closed quantum dynamics.Comment: 6 pages, 1 figure. Short version of http://arxiv.org/abs/1208.069

    CMOS design of chaotic oscillators using state variables: a monolithic Chua's circuit

    Get PDF
    This paper presents design considerations for monolithic implementation of piecewise-linear (PWL) dynamic systems in CMOS technology. Starting from a review of available CMOS circuit primitives and their respective merits and drawbacks, the paper proposes a synthesis approach for PWL dynamic systems, based on state-variable methods, and identifies the associated analog operators. The GmC approach, combining quasi-linear VCCS's, PWL VCCS's, and capacitors is then explored regarding the implementation of these operators. CMOS basic building blocks for the realization of the quasi-linear VCCS's and PWL VCCS's are presented and applied to design a Chua's circuit IC. The influence of GmC parasitics on the performance of dynamic PWL systems is illustrated through this example. Measured chaotic attractors from a Chua's circuit prototype are given. The prototype has been fabricated in a 2.4- mu m double-poly n-well CMOS technology, and occupies 0.35 mm/sup 2/, with a power consumption of 1.6 mW for a +or-2.5-V symmetric supply. Measurements show bifurcation toward a double-scroll Chua's attractor by changing a bias current

    Reducing branch delay to zero in pipelined processors

    Get PDF
    A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Peer ReviewedPostprint (published version

    Synthesis and Optimization of Reversible Circuits - A Survey

    Full text link
    Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signal gain. Research in generating reversible logic distinguishes between circuit synthesis, post-synthesis optimization, and technology mapping. In this survey, we review algorithmic paradigms --- search-based, cycle-based, transformation-based, and BDD-based --- as well as specific algorithms for reversible synthesis, both exact and heuristic. We conclude the survey by outlining key open challenges in synthesis of reversible and quantum logic, as well as most common misconceptions.Comment: 34 pages, 15 figures, 2 table

    Integrated Transversal Equalizers in High-Speed Fiber-Optic Systems

    Get PDF
    Intersymbol interference (ISI) caused by intermodal dispersion in multimode fibers is the major limiting factor in the achievable data rate or transmission distance in high-speed multimode fiber-optic links for local area networks applications. Compared with optical-domain and other electrical-domain dispersion compensation methods, equalization with transversal filters based on distributed circuit techniques presents a cost-effective and low-power solution. The design of integrated distributed transversal equalizers is described in detail with focus on delay lines and gain stages. This seven-tap distributed transversal equalizer prototype has been implemented in a commercial 0.18-µm SiGe BiCMOS process for 10-Gb/s multimode fiber-optic links. A seven-tap distributed transversal equalizer reduces the ISI of a 10-Gb/s signal after 800 m of 50-µm multimode fiber from 5 to 1.38 dB, and improves the bit-error rate from about 10^-5 to less than 10^-12
    corecore