99,646 research outputs found
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
CPU-FPGA heterogeneous architectures are attracting ever-increasing attention
in an attempt to advance computational capabilities and energy efficiency in
today's datacenters. These architectures provide programmers with the ability
to reprogram the FPGAs for flexible acceleration of many workloads.
Nonetheless, this advantage is often overshadowed by the poor programmability
of FPGAs whose programming is conventionally a RTL design practice. Although
recent advances in high-level synthesis (HLS) significantly improve the FPGA
programmability, it still leaves programmers facing the challenge of
identifying the optimal design configuration in a tremendous design space.
This paper aims to address this challenge and pave the path from software
programs towards high-quality FPGA accelerators. Specifically, we first propose
the composable, parallel and pipeline (CPP) microarchitecture as a template of
accelerator designs. Such a well-defined template is able to support efficient
accelerator designs for a broad class of computation kernels, and more
importantly, drastically reduce the design space. Also, we introduce an
analytical model to capture the performance and resource trade-offs among
different design configurations of the CPP microarchitecture, which lays the
foundation for fast design space exploration. On top of the CPP
microarchitecture and its analytical model, we develop the AutoAccel framework
to make the entire accelerator generation automated. AutoAccel accepts a
software program as an input and performs a series of code transformations
based on the result of the analytical-model-based design space exploration to
construct the desired CPP microarchitecture. Our experiments show that the
AutoAccel-generated accelerators outperform their corresponding software
implementations by an average of 72x for a broad class of computation kernels
Getting a Grip on the Transverse Motion in a Zeeman Decelerator
Zeeman deceleration is an experimental technique in which inhomogeneous,
time-dependent magnetic fields generated inside an array of solenoid coils are
used to manipulate the velocity of a supersonic beam. A 12-stage Zeeman
decelerator has been built and characterized using hydrogen atoms as a test
system. The instrument has several original features including the possibility
to replace each deceleration coil individually. In this article, we give a
detailed description of the experimental setup, and illustrate its performance.
We demonstrate that the overall acceptance in a Zeeman decelerator can be
significantly increased with only minor changes to the setup itself. This is
achieved by applying a rather low, anti-parallel magnetic field in one of the
solenoid coils that forms a temporally varying quadrupole field, and improves
particle confinement in the transverse direction. The results are reproduced by
three-dimensional numerical particle trajectory simulations thus allowing for a
rigorous analysis of the experimental data. The findings suggest the use of a
modified coil configuration to improve transverse focusing during the
deceleration process.Comment: accepted by J. Chem. Phy
Efficient Quantum Pseudorandomness
Randomness is both a useful way to model natural systems and a useful tool
for engineered systems, e.g. in computation, communication and control. Fully
random transformations require exponential time for either classical or quantum
systems, but in many case pseudorandom operations can emulate certain
properties of truly random ones. Indeed in the classical realm there is by now
a well-developed theory of such pseudorandom operations. However the
construction of such objects turns out to be much harder in the quantum case.
Here we show that random quantum circuits are a powerful source of quantum
pseudorandomness. This gives the for the first time a polynomialtime
construction of quantum unitary designs, which can replace fully random
operations in most applications, and shows that generic quantum dynamics cannot
be distinguished from truly random processes. We discuss applications of our
result to quantum information science, cryptography and to understanding
self-equilibration of closed quantum dynamics.Comment: 6 pages, 1 figure. Short version of http://arxiv.org/abs/1208.069
CMOS design of chaotic oscillators using state variables: a monolithic Chua's circuit
This paper presents design considerations for monolithic implementation of piecewise-linear (PWL) dynamic systems in CMOS technology. Starting from a review of available CMOS circuit primitives and their respective merits and drawbacks, the paper proposes a synthesis approach for PWL dynamic systems, based on state-variable methods, and identifies the associated analog operators. The GmC approach, combining quasi-linear VCCS's, PWL VCCS's, and capacitors is then explored regarding the implementation of these operators. CMOS basic building blocks for the realization of the quasi-linear VCCS's and PWL VCCS's are presented and applied to design a Chua's circuit IC. The influence of GmC parasitics on the performance of dynamic PWL systems is illustrated through this example. Measured chaotic attractors from a Chua's circuit prototype are given. The prototype has been fabricated in a 2.4- mu m double-poly n-well CMOS technology, and occupies 0.35 mm/sup 2/, with a power consumption of 1.6 mW for a +or-2.5-V symmetric supply. Measurements show bifurcation toward a double-scroll Chua's attractor by changing a bias current
Reducing branch delay to zero in pipelined processors
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Peer ReviewedPostprint (published version
Synthesis and Optimization of Reversible Circuits - A Survey
Reversible logic circuits have been historically motivated by theoretical
research in low-power electronics as well as practical improvement of
bit-manipulation transforms in cryptography and computer graphics. Recently,
reversible circuits have attracted interest as components of quantum
algorithms, as well as in photonic and nano-computing technologies where some
switching devices offer no signal gain. Research in generating reversible logic
distinguishes between circuit synthesis, post-synthesis optimization, and
technology mapping. In this survey, we review algorithmic paradigms ---
search-based, cycle-based, transformation-based, and BDD-based --- as well as
specific algorithms for reversible synthesis, both exact and heuristic. We
conclude the survey by outlining key open challenges in synthesis of reversible
and quantum logic, as well as most common misconceptions.Comment: 34 pages, 15 figures, 2 table
Integrated Transversal Equalizers in High-Speed Fiber-Optic Systems
Intersymbol interference (ISI) caused by intermodal dispersion in multimode fibers is the major limiting factor in the achievable data rate or transmission distance in high-speed multimode fiber-optic links for local area networks applications. Compared with optical-domain and other electrical-domain dispersion compensation methods, equalization with transversal filters based on distributed circuit techniques presents a cost-effective and low-power solution. The design of integrated distributed transversal equalizers is described in detail with focus on delay lines and gain stages. This seven-tap distributed transversal equalizer prototype has been implemented in a commercial 0.18-µm SiGe BiCMOS process for 10-Gb/s multimode fiber-optic links. A seven-tap distributed transversal equalizer reduces the ISI of a 10-Gb/s signal after 800 m of 50-µm multimode fiber from 5 to 1.38 dB, and improves the bit-error rate from about 10^-5 to less than 10^-12
- …