13,676 research outputs found
FPGA-accelerated machine learning inference as a service for particle physics computing
New heterogeneous computing paradigms on dedicated hardware with increased
parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting
solutions with large potential gains. The growing applications of machine
learning algorithms in particle physics for simulation, reconstruction, and
analysis are naturally deployed on such platforms. We demonstrate that the
acceleration of machine learning inference as a web service represents a
heterogeneous computing solution for particle physics experiments that
potentially requires minimal modification to the current computing model. As
examples, we retrain the ResNet-50 convolutional neural network to demonstrate
state-of-the-art performance for top quark jet tagging at the LHC and apply a
ResNet-50 model with transfer learning for neutrino event classification. Using
Project Brainwave by Microsoft to accelerate the ResNet-50 image classification
model, we achieve average inference times of 60 (10) milliseconds with our
experimental physics software framework using Brainwave as a cloud (edge or
on-premises) service, representing an improvement by a factor of approximately
30 (175) in model inference latency over traditional CPU inference in current
experimental hardware. A single FPGA service accessed by many CPUs achieves a
throughput of 600--700 inferences per second using an image batch of one,
comparable to large batch-size GPU throughput and significantly better than
small batch-size GPU throughput. Deployed as an edge or cloud service for the
particle physics computing model, coprocessor accelerators can have a higher
duty cycle and are potentially much more cost-effective.Comment: 16 pages, 14 figures, 2 table
Path-tracing Monte Carlo Library for 3D Radiative Transfer in Highly Resolved Cloudy Atmospheres
Interactions between clouds and radiation are at the root of many
difficulties in numerically predicting future weather and climate and in
retrieving the state of the atmosphere from remote sensing observations. The
large range of issues related to these interactions, and in particular to
three-dimensional interactions, motivated the development of accurate radiative
tools able to compute all types of radiative metrics, from monochromatic, local
and directional observables, to integrated energetic quantities. In the
continuity of this community effort, we propose here an open-source library for
general use in Monte Carlo algorithms. This library is devoted to the
acceleration of path-tracing in complex data, typically high-resolution
large-domain grounds and clouds. The main algorithmic advances embedded in the
library are those related to the construction and traversal of hierarchical
grids accelerating the tracing of paths through heterogeneous fields in
null-collision (maximum cross-section) algorithms. We show that with these
hierarchical grids, the computing time is only weakly sensitivive to the
refinement of the volumetric data. The library is tested with a rendering
algorithm that produces synthetic images of cloud radiances. Two other examples
are given as illustrations, that are respectively used to analyse the
transmission of solar radiation under a cloud together with its sensitivity to
an optical parameter, and to assess a parametrization of 3D radiative effects
of clouds.Comment: Submitted to JAMES, revised and submitted again (this is v2
HSTREAM: A directive-based language extension for heterogeneous stream computing
Big data streaming applications require utilization of heterogeneous parallel
computing systems, which may comprise multiple multi-core CPUs and many-core
accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such
systems require advanced knowledge of several hardware architectures and
device-specific programming models, including OpenMP and CUDA. In this paper,
we present HSTREAM, a compiler directive-based language extension to support
programming stream computing applications for heterogeneous parallel computing
systems. HSTREAM source-to-source compiler aims to increase the programming
productivity by enabling programmers to annotate the parallel regions for
heterogeneous execution and generate target specific code. The HSTREAM runtime
automatically distributes the workload across CPUs and accelerating devices. We
demonstrate the usefulness of HSTREAM language extension with various
applications from the STREAM benchmark. Experimental evaluation results show
that HSTREAM can keep the same programming simplicity as OpenMP, and the
generated code can deliver performance beyond what CPUs-only and GPUs-only
executions can deliver.Comment: Preprint, 21st IEEE International Conference on Computational Science
and Engineering (CSE 2018
- …