1,575 research outputs found
Beam-beam simulation code BBSIM for particle accelerators
A highly efficient, fully parallelized, six-dimensional tracking model for
simulating interactions of colliding hadron beams in high energy ring colliders
and simulating schemes for mitigating their effects is described. The model
uses the weak-strong approximation for calculating the head-on interactions
when the test beam has lower intensity than the other beam, a look-up table for
the efficient calculation of long-range beam-beam forces, and a self-consistent
Poisson solver when both beams have comparable intensities. A performance test
of the model in a parallel environment is presented. The code is used to
calculate beam emittance and beam loss in the Tevatron at Fermilab and compared
with measurements. We also present results from the studies of two schemes
proposed to compensate the beam-beam interactions: a) the compensation of
long-range interactions in the Relativistic Heavy Ion Collider (RHIC) at
Brookhaven and the Large Hadron Collider (LHC) at CERN with a current-carrying
wire, b) the use of a low energy electron beam to compensate the head-on
interactions in RHIC
Recommended from our members
Ultra-Low Leakage, Energy-Efficient Digital Integrated Circuit and System Design
The advances of the complementary metal-oxide-semiconductor (CMOS) technology manufacturing and design over the years have enabled a diverse range of applications across the power consumption, performance, and area (PPA) spectra. Many of the recent and prospective applications rely on the availability of energy-autonomous, miniaturized systems, i.e., ultra-low power (ULP) VLSI systems, which are generally characterized by extreme resource limitations. Some examples of applications are wireless sensing platforms, body-area sensor networks (BASN), biomedical and implantable devices, wearables, hearables, and monitors. Within the context of such applications, the key requirements are long lifetime and miniaturized size (sub-/millimeter-scale). In order to enable both requirements, energy-efficiency is the key metric. It allows for extended battery lifetime and operation with the energy that can be harvested from the environment, and it limits the size (volume) of the energy sources utilized to power these systems.
Ultra-low voltage (ULV) operation is a key technique in which the VDD of circuits is reduced from nominal to near or below the threshold voltage of the transistor. It is a powerful knob that has been largely exploited by designers in order to achieve ultra-low power consumption and high energy-efficiency in CMOS. Existing ULP VLSI systems typically operate at a lower supply voltage thereby reducing their energy consumption by one to two orders of magnitude in order to enable the aforementioned applications.
While supply voltage scaling is a promising measure for achieving low power and reducing energy consumption, it brings up several challenges. One critical issue is the leakage energy dissipated by the devices, which is magnified in portion to the total energy consumption at ULV. The reason is that, as VDD scales from nominal to near-threshold and sub-threshold, transistors become increasingly slower and they accumulate more leakage (i.e., static) power over longer cycle times. This energy waste accounts for a significant portion of the system's total energy consumption, offsets the gains provided by voltage scaling, defines the minimum energy per operation, and poses a practical limit for the system's energy-efficiency.
This thesis presents selected research works on ultra-low leakage, energy-efficient digital integrated circuit design. More specifically, it describes novel and key techniques for minimizing the energy waste of idle/underutilized and always-on hardware. The main goal of such techniques is to push the envelope of energy-efficiency in energy-autonomous, miniaturized VLSI systems. Such techniques are applied to key building blocks of emerging mobile and embedded computing devices resulting in state-of-the-art energy-efficiencies
GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs
In recent years, architectures combining a reconfigurable fabric and a
general purpose processor on a single chip became increasingly popular. Such
hybrid architectures allow extending embedded software with application
specific hardware accelerators to improve performance and/or energy efficiency.
Aiding system designers and programmers at handling the complexity of the
required process of hardware/software (HW/SW) partitioning is an important
issue. Current methods are often restricted, either to bare-metal systems, to
subsets of mainstream programming languages, or require special coding
guidelines, e.g., via annotations. These restrictions still represent a high
entry barrier for the wider community of programmers that new hybrid
architectures are intended for. In this paper we revisit HW/SW partitioning and
present a seamless programming flow for unrestricted, legacy C code. It
consists of a retargetable GCC plugin that automatically identifies code
sections for hardware acceleration and generates code accordingly. The proposed
workflow was evaluated on the Xilinx Zynq platform using unmodified code from
an embedded benchmark suite.Comment: Presented at Second International Workshop on FPGAs for Software
Programmers (FSP 2015) (arXiv:1508.06320
Benchmarking CPUs and GPUs on embedded platforms for software receiver usage
Smartphones containing multi-core central processing units (CPUs) and powerful many-core graphics processing units (GPUs) bring supercomputing technology into your pocket (or into our embedded devices). This can be exploited to produce power-efficient, customized receivers with flexible correlation schemes and more advanced positioning techniques. For example, promising techniques such as the Direct Position Estimation paradigm or usage of tracking solutions based on particle filtering, seem to be very appealing in challenging environments but are likewise computationally quite demanding. This article sheds some light onto recent embedded processor developments, benchmarks Fast Fourier Transform (FFT) and correlation algorithms on representative embedded platforms and relates the results to the use in GNSS software radios. The use of embedded CPUs for signal tracking seems to be straight forward, but more research is required to fully achieve the nominal peak performance of an embedded GPU for FFT computation. Also the electrical power consumption is measured in certain load levels.Peer ReviewedPostprint (published version
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs
Implementing embedded neural network processing at the edge requires
efficient hardware acceleration that couples high computational performance
with low power consumption. Driven by the rapid evolution of network
architectures and their algorithmic features, accelerator designs are
constantly updated and improved. To evaluate and compare hardware design
choices, designers can refer to a myriad of accelerator implementations in the
literature. Surveys provide an overview of these works but are often limited to
system-level and benchmark-specific performance metrics, making it difficult to
quantitatively compare the individual effect of each utilized optimization
technique. This complicates the evaluation of optimizations for new accelerator
designs, slowing-down the research progress. This work provides a survey of
neural network accelerator optimization approaches that have been used in
recent works and reports their individual effects on edge processing
performance. It presents the list of optimizations and their quantitative
effects as a construction kit, allowing to assess the design choices for each
building block separately. Reported optimizations range from up to 10'000x
memory savings to 33x energy reductions, providing chip designers an overview
of design choices for implementing efficient low power neural network
accelerators
Are coarse-grained overlays ready for general purpose application acceleration on FPGAs?
Combining processors with hardware accelerators has become a norm with systems-on-chip (SoCs) ever present in modern compute devices. Heterogeneous programmable system on chip platforms sometimes referred to as hybrid FPGAs, tightly couple general purpose processors with high performance reconfigurable fabrics, providing a more flexible alternative. We can now think of a software application with hardware accelerated portions that are reconfigured at runtime. While such ideas have been explored in the past, modern hybrid FPGAs are the first commercial platforms to enable this move to a more software oriented view, where reconfiguration enables hardware resources to be shared by multiple tasks in a bigger application. However, while the rapidly increasing logic density and more capable hard resources found in modern hybrid FPGA devices should make them widely deployable, they remain constrained within specialist application domains. This is due to both design productivity issues and a lack of suitable hardware abstraction to eliminate the need for working with platform-specific details, as server and desktop virtualization has done in a more general sense. To allow mainstream adoption of FPGA based accelerators in general purpose computing, there is a need to virtualize FPGAs and make them more accessible to application developers who are accustomed to software API abstractions and fast development cycles. In this paper, we discuss the role of overlay architectures in enabling general purpose FPGA application acceleration
- …