14,421 research outputs found
GPU-based Real-time Triggering in the NA62 Experiment
Over the last few years the GPGPU (General-Purpose computing on Graphics
Processing Units) paradigm represented a remarkable development in the world of
computing. Computing for High-Energy Physics is no exception: several works
have demonstrated the effectiveness of the integration of GPU-based systems in
high level trigger of different experiments. On the other hand the use of GPUs
in the low level trigger systems, characterized by stringent real-time
constraints, such as tight time budget and high throughput, poses several
challenges. In this paper we focus on the low level trigger in the CERN NA62
experiment, investigating the use of real-time computing on GPUs in this
synchronous system. Our approach aimed at harvesting the GPU computing power to
build in real-time refined physics-related trigger primitives for the RICH
detector, as the the knowledge of Cerenkov rings parameters allows to build
stringent conditions for data selection at trigger level. Latencies of all
components of the trigger chain have been analyzed, pointing out that
networking is the most critical one. To keep the latency of data transfer task
under control, we devised NaNet, an FPGA-based PCIe Network Interface Card
(NIC) with GPUDirect capabilities. For the processing task, we developed
specific multiple ring trigger algorithms to leverage the parallel architecture
of GPUs and increase the processing throughput to keep up with the high event
rate. Results obtained during the first months of 2016 NA62 run are presented
and discussed
A scheduling theory framework for GPU tasks efficient execution
Concurrent execution of tasks in GPUs can reduce the computation time of a workload by
overlapping data transfer and execution commands.
However it is difficult to implement an efficient run-
time scheduler that minimizes the workload makespan
as many execution orderings should be evaluated. In
this paper, we employ scheduling theory to build a
model that takes into account the device capabili-
ties, workload characteristics, constraints and objec-
tive functions. In our model, GPU tasks schedul-
ing is reformulated as a flow shop scheduling prob-
lem, which allow us to apply and compare well known
methods already developed in the operations research
field. In addition we develop a new heuristic, specif-
ically focused on executing GPU commands, that
achieves better scheduling results than previous tech-
niques. Finally, a comprehensive evaluation, showing
the suitability and robustness of this new approach,
is conducted in three different NVIDIA architectures
(Kepler, Maxwell and Pascal).Proyecto TIN2016- 0920R, Universidad de Málaga (Campus de Excelencia Internacional Andalucía Tech) y programa de donación de NVIDIA Corporation
Intra-node Memory Safe GPU Co-Scheduling
[EN] GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the utilisation of GPUs by proposing a framework, we refer to as schedGPU, to facilitate intra-node GPU co-scheduling such that a GPU can be safely shared among multiple applications by taking memory constraints into account. Two approaches, namely a client-server and a shared memory approach are explored. However, the shared memory approach is more suitable due to lower overheads when compared to the former approach. Four policies are proposed in schedGPU to handle applications that are waiting to access the GPU, two of which account for priorities. The feasibility of schedGPU is validated on three real-world applications. The key observation is that a performance gain is achieved. For single applications, a gain of over 10 times, as measured by GPU utilisation and GPU memory utilisation, is obtained. For workloads comprising multiple applications, a speed-up of up to 5x in the total execution time is noted. Moreover, the average GPU utilisation and average GPU memory utilisation is increased by 5 and 12 times, respectively.This work was funded by Generalitat Valenciana under grant PROMETEO/2017/77.Reaño González, C.; Silla Jiménez, F.; Nikolopoulos, DS.; Varghese, B. (2018). Intra-node Memory Safe GPU Co-Scheduling. IEEE Transactions on Parallel and Distributed Systems. 29(5):1089-1102. https://doi.org/10.1109/TPDS.2017.2784428S1089110229
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
CoreTSAR: Task Scheduling for Accelerator-aware Runtimes
Heterogeneous supercomputers that incorporate computational accelerators
such as GPUs are increasingly popular due to their high
peak performance, energy efficiency and comparatively low cost.
Unfortunately, the programming models and frameworks designed
to extract performance from all computational units still lack the
flexibility of their CPU-only counterparts. Accelerated OpenMP
improves this situation by supporting natural migration of OpenMP
code from CPUs to a GPU. However, these implementations currently
lose one of OpenMP’s best features, its flexibility: typical
OpenMP applications can run on any number of CPUs. GPU implementations
do not transparently employ multiple GPUs on a node
or a mix of GPUs and CPUs. To address these shortcomings, we
present CoreTSAR, our runtime library for dynamically scheduling
tasks across heterogeneous resources, and propose straightforward
extensions that incorporate this functionality into Accelerated
OpenMP. We show that our approach can provide nearly linear
speedup to four GPUs over only using CPUs or one GPU while
increasing the overall flexibility of Accelerated OpenMP
Parallel ADMM for robust quadratic optimal resource allocation problems
An alternating direction method of multipliers (ADMM) solver is described for
optimal resource allocation problems with separable convex quadratic costs and
constraints and linear coupling constraints. We describe a parallel
implementation of the solver on a graphics processing unit (GPU) using a
bespoke quartic function minimizer. An application to robust optimal energy
management in hybrid electric vehicles is described, and the results of
numerical simulations comparing the computation times of the parallel GPU
implementation with those of an equivalent serial implementation are presented
- …