11,743 research outputs found
Glider: A GPU Library Driver for Improved System Security
Legacy device drivers implement both device resource management and
isolation. This results in a large code base with a wide high-level interface
making the driver vulnerable to security attacks. This is particularly
problematic for increasingly popular accelerators like GPUs that have large,
complex drivers. We solve this problem with library drivers, a new driver
architecture. A library driver implements resource management as an untrusted
library in the application process address space, and implements isolation as a
kernel module that is smaller and has a narrower lower-level interface (i.e.,
closer to hardware) than a legacy driver. We articulate a set of device and
platform hardware properties that are required to retrofit a legacy driver into
a library driver. To demonstrate the feasibility and superiority of library
drivers, we present Glider, a library driver implementation for two GPUs of
popular brands, Radeon and Intel. Glider reduces the TCB size and attack
surface by about 35% and 84% respectively for a Radeon HD 6450 GPU and by about
38% and 90% respectively for an Intel Ivy Bridge GPU. Moreover, it incurs no
performance cost. Indeed, Glider outperforms a legacy driver for applications
requiring intensive interactions with the device driver, such as applications
using the OpenGL immediate mode API
The AXIOM software layers
AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft
Introduction to Machine Protection
Protection of accelerator equipment is as old as accelerator technology and
was for many years related to high-power equipment. Examples are the protection
of powering equipment from overheating (magnets, power converters, high-current
cables), of superconducting magnets from damage after a quench and of
klystrons. The protection of equipment from beam accidents is more recent,
although there was one paper that discussed beam-induced damage for the SLAC
linac (Stanford Linear Accelerator Center) as early as in 1967. It is related
to the increasing beam power of high-power proton accelerators, to the emission
of synchrotron light by electron-positron accelerators and to the increase of
energy stored in the beam. Designing a machine protection system requires an
excellent understanding of accelerator physics and operation to anticipate
possible failures that could lead to damage. Machine protection includes beam
and equipment monitoring, a system to safely stop beam operation (e.g. dumping
the beam or stopping the beam at low energy) and an interlock system providing
the glue between these systems. The most recent accelerator, LHC, will operate
with about 3 x 10 protons per beam, corresponding to an energy stored in
each beam of 360 MJ. This energy can cause massive damage to accelerator
equipment in case of uncontrolled beam loss, and a single accident damaging
vital parts of the accelerator could interrupt operation for years. This
lecture will provide an overview of the requirements for protection of
accelerator equipment and introduces various protection systems. Examples are
mainly from LHC and ESS.Comment: 20 pages, contribution to the 2014 Joint International Accelerator
School: Beam Loss and Accelerator Protection, Newport Beach, CA, USA , 5-14
Nov 2014. arXiv admin note: text overlap with arXiv:1601.0520
Planning the Future of U.S. Particle Physics (Snowmass 2013): Chapter 6: Accelerator Capabilities
These reports present the results of the 2013 Community Summer Study of the
APS Division of Particles and Fields ("Snowmass 2013") on the future program of
particle physics in the U.S. Chapter 6, on Accelerator Capabilities, discusses
the future progress of accelerator technology, including issues for high-energy
hadron and lepton colliders, high-intensity beams, electron-ion colliders, and
necessary R&D for future accelerator technologies.Comment: 26 page
Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude.
We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research
project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement
n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.Postprint (author's final draft
- …