Search CORE

16 research outputs found

Parallel Triplet Finding for Particle Track Reconstruction. [Mit einer ausführlichen deutschen Zusammenfassung]

Author: Funke Daniel
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2013
Field of study

A GPU-based real time trigger for rare kaon decays at NA62

Author: BS Blanchard
BS Blanchard
GS Fishman
JV Jones
K Keller
R Austin
S Chiesa
S Chiesa
Publication venue: 'Pisa University Press'
Publication date: 22/10/2013
Field of study

Abstract This thesis reports a study for a new real-time trigger for the NA62 experiment based on Graphical Processing Units (GPUs). The NA62 experiment was devised to study with unprecedented precision the ultra-rare decay K+ → π+ ν anti-ν, a process mediated by Flavour-Changing Neutral Currents (FCNC) whose exceptional theoretical cleanliness provides a unique probe to test the Standard Model. The use of a high-rate kaon beam will result in an event rate of about 15 MHz, so high that it will be impossible to store data on disk without an efficient selection. The experiment therefore devised three trigger levels, allowing to reduce the data rate fed to the readout PC farm down to ∼10 kHz. For this thesis I developed an online trigger algorithm that uses data fed by the RICH (Ring Imaging CHerenkov counter) detector in real-time to allow a rejection of the dominant background K+ → π+ π 0 based on kinematical constraints. As a starting point for the development of this algorithm, I verified the feasibility of such a trigger through Montecarlo simulations. I measured the reconstruction resolution, achieved by the RICH detector alone, of the kinematical variables used for the event selection. After that, I analysed the background rejection power and the signal efficiency of several kinematical constraints, and I designed an actual trigger algorithm. The necessity of running the algorithm in real-time, with a maximum latency of 1 ms per event, drove the choice of exploiting the parallel computing power of GPUs. A parallelized algorithm was therefore developed, that can fit up to 4 Cherenkov rings per event. Moreover, a large number of events are processed concurrently. No parallelized and seedless multi-ring fitting algorithm existed before. The developed algorithm consists of a pattern recognition stage, to assign the hits to up to 4 ring candidates, and of a robust single-ring fit routine. The program was tested on GPUs, and its performance and execution latency proved to be compatible with the requirements. This work proves that alternative trigger designs are possible for the NA62 experiment, and represents a starting point for the introduction of flexible GPU-based real-time triggers in High Energy Physics. Sommario La mia tesi costituisce uno studio per un algoritmo di trigger in tempo reale basato su GPU (Graphical Processing Units) per l’esperimento NA62. NA62 è un esperimento progettato per misurare con precisione il decadimento ultra raro K+ → π+ ν anti-ν, un canale mediato da correnti neutre flavour-changing estremamente sensibile all’eventuale presenza di nuova fisica. L’elevato rate di eventi rivelati, dell’ordine di 15 MHz, non permetterà una archiviazione su disco dei dati non moderata da severi criteri di selezione. Sono perciò necessari dei livelli di trigger che consentano di ridurre il rate di eventi salvati fino a circa una decina di kHz. L’algoritmo sviluppato si basa sull’uso del rivelatore RICH (Ring Imaging CHerenkov counter). Le informazioni primitive inviate dal RICH vengono valutate in tempo reale, per produrre una decisione di trigger basata prevalentemente su considerazioni di cinematica. In una prima fase ho verificato, tramite simulazione Montecarlo, la fattibilità e significatività di tale progetto. Ho dapprima misurato la risoluzione sulla ricostruzione di alcune quantità cinematiche ricavate utilizzando unicamente il rivelatore RICH, poiché per un trigger di primo livello in tempo reale non sarà possibile mettere in relazione dati forniti da rivelatori diversi. Ho studiato poi fino a che livello fosse possibile separare il segnale dal fondo, misurando l’efficienza di reiezione e l’accettanza per il segnale al variare di alcuni parametri di selezione. Data la necessità di eseguire il programma in tempo reale, con una latenza massima di 1 ms per evento, si è deciso di sfruttare il potere computazionale parallelo proprio delle GPU (processori grafici ad elevato parallelismo). E’ stato quindi sviluppato un algoritmo in grado di eseguire simultaneamente non solo le istruzioni relative ad eventi diversi, ma anche i fit di fino a 4 anelli Cherenkov diversi appartenenti allo stesso evento. Nessun algoritmo parallelo e seedless di questo tipo esisteva in letteratura. L’algoritmo implementato è composto di due parti: una iniziale di riconoscimento di pattern, che estrae il numero di anelli presenti nella matrice ed identifica gli hit appartenenti a ciascuno di essi, ed una di fit dei singoli cerchi. Il programma è stato testato su GPU, ed efficienza e tempi di esecuzione risultano compatibili con le richieste. Questo lavoro apre la possibilità di implementare trigger alternativi e flessibili per NA62 e rappresenta un primo esempio prototipale dell’uso di GPU in tempo reale

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A GPU-based real time trigger for rare kaon decays at NA62

Author: GRAVERINI ELENA
Publication venue: 'Pisa University Press'
Publication date: 22/10/2013
Field of study

Sharing GPUs for Real-Time Autonomous-Driving Systems

Author: Yang Ming
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2020
Field of study

Autonomous vehicles at mass-market scales are on the horizon. Cameras are the least expensive among common sensor types and can preserve features such as color and texture that other sensors cannot. Therefore, realizing full autonomy in vehicles at a reasonable cost is expected to entail computer-vision techniques. These computer-vision applications require massive parallelism provided by the underlying shared accelerators, such as graphics processing units, or GPUs, to function “in real time.” However, when computer-vision researchers and GPU vendors refer to “real time,” they usually mean “real fast”; in contrast, certifiable automotive systems must be “real time” in the sense of being predictable. This dissertation addresses the challenging problem of how GPUs can be shared predictably and efficiently for real-time autonomous-driving systems. We tackle this challenge in four steps. First, we investigate NVIDIA GPUs with respect to scheduling, synchronization, and execution. We conduct an extensive set of experiments to infer NVIDIA GPU scheduling rules, which are unfortunately undisclosed by NVIDIA and are beyond access owing to their closed-source software stack. We also expose a list of pitfalls pertaining to CPU-GPU synchronization that can result in unbounded response times of GPU-using applications. Lastly, we examine a fundamental trade-off for designing real-time tasks under different execution options. Overall, our investigation provides an essential understanding of NVIDIA GPUs, allowing us to further model and analyze GPU tasks. Second, we develop a new model and conduct schedulability analysis for GPU tasks. We extend the well-studied sporadic task model with additional parameters that characterize the parallel execution of GPU tasks. We show that NVIDIA scheduling rules are subject to fundamental capacity loss, which implies a necessary total utilization bound. We derive response-time bounds for GPU task systems that satisfy our schedulability conditions. Third, we address an industrial challenge of supplying the throughput performance of computer-vision frameworks to support adequate coverage and redundancy offered by an array of cameras. We re-think the design of convolution neural network (CNN) software to better utilize hardware resources and achieve increased throughput (number of simultaneous camera streams) without any appreciable increase in per-frame latency (camera to CNN output) or reduction of per-stream accuracy. Fourth, we apply our analysis to a finer-grained graph scheduling of a computer-vision standard, OpenVX, which explicitly targets embedded and real-time systems. We evaluate both the analytical and empirical real-time performance of our approach.Doctor of Philosoph

Carolina Digital Repository

Meteorological modelling on the ICL distributed array processor and other parallel computers

Author: Carver Glenn Derek
Publication venue: The University of Edinburgh
Publication date: 01/01/1990
Field of study