2,631 research outputs found
ATLAS Upgrade Instrumentation in the US
Planned upgrades of the LHC over the next decade should allow the machine to
operate at a center of mass energy of 14 TeV with instantaneous luminosities in
the range 5--7e34 cm^-2 s^-1. With these parameters, ATLAS could collect 3,000
fb^-1 of data in approximately 10 years. However, the conditions under which
this data would be acquired are much harsher than those currently encountered
at the LHC. For example, the number of proton-proton interactions per bunch
crossing will rise from the level of 20--30 per 50 ns crossing observed in 2012
to 140--200 every 25 ns. In order to deepen our understanding of the newly
discovered Higgs boson and to extend our searches for physics beyond that new
particle, the ATLAS detector, trigger, and readout will have to undergo
significant upgrades. In this whitepaper we describe R&D necessary for ATLAS to
continue to run effectively at the highest luminosities foreseen from the LHC.
Emphasis is placed on those R&D efforts in which US institutions are playing a
leading role.Comment: Snowmass contributed paper, 24 pages, 12 figure
Performance of the LHCb vertex locator
The Vertex Locator (VELO) is a silicon microstrip detector that surrounds the proton-proton interaction region in the LHCb experiment. The performance of the detector during the first years of its physics operation is reviewed. The system is operated in vacuum, uses a bi-phase CO2 cooling system, and the sensors are moved to 7 mm from the LHC beam for physics data taking. The performance and stability of these characteristic features of the detector are described, and details of the material budget are given. The calibration of the timing and the data processing algorithms that are implemented in FPGAs are described. The system performance is fully characterised. The sensors have a signal to noise ratio of approximately 20 and a best hit resolution of 4 μm is achieved at the optimal track angle. The typical detector occupancy for minimum bias events in standard operating conditions in 2011 is around 0.5%, and the detector has less than 1% of faulty strips. The proximity of the detector to the beam means that the inner regions of the n+-on-n sensors have undergone space-charge sign inversion due to radiation damage. The VELO performance parameters that drive the experiment's physics sensitivity are also given. The track finding efficiency of the VELO is typically above 98% and the modules have been aligned to a precision of 1 μm for translations in the plane transverse to the beam. A primary vertex resolution of 13 μm in the transverse plane and 71 μm along the beam axis is achieved for vertices with 25 tracks. An impact parameter resolution of less than 35 μm is achieved for particles with transverse momentum greater than 1 GeV/c
A Comprehensive Workflow for General-Purpose Neural Modeling with Highly Configurable Neuromorphic Hardware Systems
In this paper we present a methodological framework that meets novel
requirements emerging from upcoming types of accelerated and highly
configurable neuromorphic hardware systems. We describe in detail a device with
45 million programmable and dynamic synapses that is currently under
development, and we sketch the conceptual challenges that arise from taking
this platform into operation. More specifically, we aim at the establishment of
this neuromorphic system as a flexible and neuroscientifically valuable
modeling tool that can be used by non-hardware-experts. We consider various
functional aspects to be crucial for this purpose, and we introduce a
consistent workflow with detailed descriptions of all involved modules that
implement the suggested steps: The integration of the hardware interface into
the simulator-independent model description language PyNN; a fully automated
translation between the PyNN domain and appropriate hardware configurations; an
executable specification of the future neuromorphic system that can be
seamlessly integrated into this biology-to-hardware mapping process as a test
bench for all software layers and possible hardware design modifications; an
evaluation scheme that deploys models from a dedicated benchmark library,
compares the results generated by virtual or prototype hardware devices with
reference software simulations and analyzes the differences. The integration of
these components into one hardware-software workflow provides an ecosystem for
ongoing preparative studies that support the hardware design process and
represents the basis for the maturity of the model-to-hardware mapping
software. The functionality and flexibility of the latter is proven with a
variety of experimental results
Fast fluorescence lifetime imaging and sensing via deep learning
Error on title page – year of award is 2023.Fluorescence lifetime imaging microscopy (FLIM) has become a valuable tool in diverse disciplines. This thesis presents deep learning (DL) approaches to addressing two major challenges in FLIM: slow and complex data analysis and the high photon budget for precisely quantifying the fluorescence lifetimes. DL's ability to extract high-dimensional features from data has revolutionized optical and biomedical imaging analysis. This thesis contributes several novel DL FLIM algorithms that significantly expand FLIM's scope.
Firstly, a hardware-friendly pixel-wise DL algorithm is proposed for fast FLIM data analysis. The algorithm has a simple architecture yet can effectively resolve multi-exponential decay models. The calculation speed and accuracy outperform conventional methods significantly.
Secondly, a DL algorithm is proposed to improve FLIM image spatial resolution, obtaining high-resolution (HR) fluorescence lifetime images from low-resolution (LR) images. A computational framework is developed to generate large-scale semi-synthetic FLIM datasets to address the challenge of the lack of sufficient high-quality FLIM datasets. This algorithm offers a practical approach to obtaining HR FLIM images quickly for FLIM systems.
Thirdly, a DL algorithm is developed to analyze FLIM images with only a few photons per pixel, named Few-Photon Fluorescence Lifetime Imaging (FPFLI) algorithm. FPFLI uses spatial correlation and intensity information to robustly estimate the fluorescence lifetime images, pushing this photon budget to a record-low level of only a few photons per pixel.
Finally, a time-resolved flow cytometry (TRFC) system is developed by integrating an advanced CMOS single-photon avalanche diode (SPAD) array and a DL processor. The SPAD array, using a parallel light detection scheme, shows an excellent photon-counting throughput. A quantized convolutional neural network (QCNN) algorithm is designed and implemented on a field-programmable gate array as an embedded processor. The processor resolves fluorescence lifetimes against disturbing noise, showing unparalleled high accuracy, fast analysis speed, and low power consumption.Fluorescence lifetime imaging microscopy (FLIM) has become a valuable tool in diverse disciplines. This thesis presents deep learning (DL) approaches to addressing two major challenges in FLIM: slow and complex data analysis and the high photon budget for precisely quantifying the fluorescence lifetimes. DL's ability to extract high-dimensional features from data has revolutionized optical and biomedical imaging analysis. This thesis contributes several novel DL FLIM algorithms that significantly expand FLIM's scope.
Firstly, a hardware-friendly pixel-wise DL algorithm is proposed for fast FLIM data analysis. The algorithm has a simple architecture yet can effectively resolve multi-exponential decay models. The calculation speed and accuracy outperform conventional methods significantly.
Secondly, a DL algorithm is proposed to improve FLIM image spatial resolution, obtaining high-resolution (HR) fluorescence lifetime images from low-resolution (LR) images. A computational framework is developed to generate large-scale semi-synthetic FLIM datasets to address the challenge of the lack of sufficient high-quality FLIM datasets. This algorithm offers a practical approach to obtaining HR FLIM images quickly for FLIM systems.
Thirdly, a DL algorithm is developed to analyze FLIM images with only a few photons per pixel, named Few-Photon Fluorescence Lifetime Imaging (FPFLI) algorithm. FPFLI uses spatial correlation and intensity information to robustly estimate the fluorescence lifetime images, pushing this photon budget to a record-low level of only a few photons per pixel.
Finally, a time-resolved flow cytometry (TRFC) system is developed by integrating an advanced CMOS single-photon avalanche diode (SPAD) array and a DL processor. The SPAD array, using a parallel light detection scheme, shows an excellent photon-counting throughput. A quantized convolutional neural network (QCNN) algorithm is designed and implemented on a field-programmable gate array as an embedded processor. The processor resolves fluorescence lifetimes against disturbing noise, showing unparalleled high accuracy, fast analysis speed, and low power consumption
Belle II Technical Design Report
The Belle detector at the KEKB electron-positron collider has collected
almost 1 billion Y(4S) events in its decade of operation. Super-KEKB, an
upgrade of KEKB is under construction, to increase the luminosity by two orders
of magnitude during a three-year shutdown, with an ultimate goal of 8E35 /cm^2
/s luminosity. To exploit the increased luminosity, an upgrade of the Belle
detector has been proposed. A new international collaboration Belle-II, is
being formed. The Technical Design Report presents physics motivation, basic
methods of the accelerator upgrade, as well as key improvements of the
detector.Comment: Edited by: Z. Dole\v{z}al and S. Un
FPGA Acceleration of Communication-Bound Streaming Applications: Architecture Modeling and a 3D Image Compositing Case Study
Reconfigurable computers usually provide a limited
number of different memory resources, such as host
memory, external memory, and on-chip memory with different
capacities and communication characteristics. A key
challenge for achieving high-performance with reconfigurable
accelerators is the efficient utilization of the available memory
resources. A detailed knowledge of the memories' parameters
is key for generating an optimized communication layout. In this paper, we discuss a benchmarking environment
for generating such a characterization. The environment is
built on IMORC, our architectural template and on-chip
network for creating reconfigurable accelerators. We provide
a characterization of the memory resources available on the
XtremeData XD1000 reconfigurable computer. Based on this
data, we present as a case study the implementation of a 3D
image compositing accelerator that is able to double the frame rate
of a parallel renderer
An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline
Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
- …