7 research outputs found
Search by triplet: An efficient local track reconstruction algorithm for parallel architectures
Millions of particles are collided every second at the LHCb detector placed inside the Large Hadron Collider at CERN. The particles produced as a result of these collisions pass through various detecting devices which will produce a combined raw data rate of up to 40 Tbps by 2021. These data will be fed through a data acquisition system which reconstructs individual particles and filters the collision events in real time. This process will occur in a heterogeneous farm employing exclusively off-the-shelf CPU and GPU hardware, in a two stage process known as High Level Trigger. The reconstruction of charged particle trajectories in physics detectors, also referred to as track reconstruction or tracking, determines the position, charge and momentum of particles as they pass through detectors. The Vertex Locator subdetector (VELO) is the closest such detector to the beamline, placed outside of the region where the LHCb magnet produces a sizable magnetic field. It is used to reconstruct straight particle trajectories which serve as seeds for reconstruction of other subdetectors and to locate collision vertices. The VELO subdetector will detect up to 109 particles every second, which need to be reconstructed in real time in the High Level Trigger. We present Search by triplet, an efficient track reconstruction algorithm. Our algorithm is designed to run efficiently across parallel architectures. We extend on previous work and explain the algorithm evolution since its inception. We show the scaling of our algorithm under various situations, and analyse its amortized time in terms of complexity for each of its constituent parts and profile its performance. Our algorithm is the current state-of-the-art in VELO track reconstruction on SIMT architectures, and we qualify its improvements over previous results
GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP
Full detector simulation was among the largest CPU consumer in all CERN
experiment software stacks for the first two runs of the Large Hadron Collider
(LHC). In the early 2010's, the projections were that simulation demands would
scale linearly with luminosity increase, compensated only partially by an
increase of computing resources. The extension of fast simulation approaches to
more use cases, covering a larger fraction of the simulation budget, is only
part of the solution due to intrinsic precision limitations. The remainder
corresponds to speeding-up the simulation software by several factors, which is
out of reach using simple optimizations on the current code base. In this
context, the GeantV R&D project was launched, aiming to redesign the legacy
particle transport codes in order to make them benefit from fine-grained
parallelism features such as vectorization, but also from increased code and
data locality. This paper presents extensively the results and achievements of
this R&D, as well as the conclusions and lessons learnt from the beta
prototype.Comment: 34 pages, 26 figures, 24 table
A high-performance portable abstract interface for explicit SIMD vectorization
This work establishes a scalable, easy to use and efficient approach for exploiting SIMD capabilities of modern CPUs, without the need for extensive knowledge of architecture specific instruction sets. We provide a description of a new API, known as UME::SIMD, which provides a flexible, portable, type-oriented abstraction for SIMD instruction set architectures. Requirements for such libraries are analysed based on existing, as well as proposed future solutions. A software architecture that achieves these requirements is explained, and its performance evaluated. Finally we discuss how the API fits into the existing, and future software ecosystem
A high-performance portable abstract interface for explicit SIMD vectorization
This work establishes a scalable, easy to use and efficient approach for exploiting SIMD capabilities of modern CPUs, without the need for extensive knowledge of architecture specific instruction sets. We provide a description of a new API, known as UME::SIMD, which provides a flexible, portable, type-oriented abstraction for SIMD instruction set architectures. Requirements for such libraries are analysed based on existing, as well as proposed future solutions. A software architecture that achieves these requirements is explained, and its performance evaluated. Finally we discuss how the API fits into the existing, and future software ecosystem
Optimization of high-throughput real-time processes in physics reconstruction
La presente tesis se ha desarrollado en colaboración entre
la Universidad de Sevilla y la Organización Europea para la
Investigación Nuclear, CERN.
El detector LHCb es uno de los cuatro grandes detectores
situados en el Gran Colisionador de Hadrones, LHC. En LHCb,
se colisionan partÃculas a altas energÃas para comprender la
diferencia existente entre la materia y la antimateria. Debido a la
cantidad ingente de datos generada por el detector, es necesario
realizar un filtrado de datos en tiempo real, fundamentado en
los conocimientos actuales recogidos en el Modelo Estándar de
fÃsica de partÃculas. El filtrado, también conocido como High
Level Trigger, deberá procesar un throughput de 40 Tb/s de datos,
y realizar un filtrado de aproximadamente 1 000:1, reduciendo
el throughput a unos 40 Gb/s de salida, que se almacenan para
posterior análisis.
El proceso del High Level Trigger se subdivide a su vez en
dos etapas: High Level Trigger 1 (HLT1) y High Level Trigger
2 (HLT2). El HLT1 transcurre en tiempo real, y realiza una reducción de datos de aproximadamente 30:1. El HLT1 consiste
en una serie de procesos software que reconstruyen lo que ha
sucedido en la colisión de partÃculas. En la reconstrucción del
HLT1 únicamente se analizan las trayectorias de las partÃculas
producidas fruto de la colisión, en un problema conocido como
reconstrucción de trazas, para dictaminar el interés de las colisiones.
Por contra, el proceso HLT2 es más fino, requiriendo más
tiempo en realizarse y reconstruyendo todos los subdetectores
que componen LHCb.
Hacia 2020, el detector LHCb, asà como todos los componentes
del sistema de adquisici´on de datos, serán actualizados acorde
a los últimos desarrollos técnicos. Como parte del sistema
de adquisición de datos, los servidores que procesan HLT1 y
HLT2 también sufrirán una actualización. Al mismo tiempo, el
acelerador LHC será también actualizado, de manera que la
cantidad de datos generada en cada cruce de grupo de partÃculas
aumentare en aproxidamente 5 veces la actual. Debido a
las actualizaciones tanto del acelerador como del detector, se
prevé que la cantidad de datos que deberá procesar el HLT en
su totalidad sea unas 40 veces mayor a la actual.
La previsión de la escalabilidad del software actual a 2020
subestim´ó los recursos necesarios para hacer frente al incremento
en throughput. Esto produjo que se pusiera en marcha un
estudio de todos los algoritmos tanto del HLT1 como del HLT2,
asà como una actualización del código a nuevos estándares, para
mejorar su rendimiento y ser capaz de procesar la cantidad de
datos esperada.
En esta tesis, se exploran varios algoritmos de la reconstrucción de LHCb. El problema de reconstrucción de trazas se analiza
en profundidad y se proponen nuevos algoritmos para su
resolución. Ya que los problemas analizados exhiben un paralelismo
masivo, estos algoritmos se implementan en lenguajes
especializados para tarjetas gráficas modernas (GPUs), dada su
arquitectura inherentemente paralela. En este trabajo se dise Ëœnan
dos algoritmos de reconstrucción de trazas. Además, se diseñan
adicionalmente cuatro algoritmos de decodificación y un algoritmo
de clustering, problemas también encontrados en el HLT1.
Por otra parte, se diseña un algoritmo para el filtrado de Kalman,
que puede ser utilizado en ambas etapas.
Los algoritmos desarrollados cumplen con los requisitos esperados
por la colaboración LHCb para el año 2020. Para poder
ejecutar los algoritmos eficientemente en tarjetas gráficas, se
desarrolla un framework especializado para GPUs, que permite
la ejecución paralela de secuencias de reconstrucción en GPUs.
Combinando los algoritmos desarrollados con el framework, se
completa una secuencia de ejecución que asienta las bases para
un HLT1 ejecutable en GPU.
Durante la investigación llevada a cabo en esta tesis, y gracias
a los desarrollos arriba mencionados y a la colaboración de
un pequeño equipo de personas coordinado por el autor, se
completa un HLT1 ejecutable en GPUs. El rendimiento obtenido
en GPUs, producto de esta tesis, permite hacer frente al reto de
ejecutar una secuencia de reconstrucción en tiempo real, bajo
las condiciones actualizadas de LHCb previstas para 2020. As´ı
mismo, se completa por primera vez para cualquier experimento
del LHC un High Level Trigger que se ejecuta únicamente en
GPUs. Finalmente, se detallan varias posibles configuraciones
para incluir tarjetas gr´aficas en el sistema de adquisición de
datos de LHCb.The current thesis has been developed in collaboration between
Universidad de Sevilla and the European Organization for Nuclear
Research, CERN.
The LHCb detector is one of four big detectors placed alongside
the Large Hadron Collider, LHC. In LHCb, particles are
collided at high energies in order to understand the difference
between matter and antimatter. Due to the massive quantity
of data generated by the detector, it is necessary to filter data
in real-time. The filtering, also known as High Level Trigger,
processes a throughput of 40 Tb/s of data and performs a selection
of approximately 1 000:1. The throughput is thus reduced
to roughly 40 Gb/s of data output, which is then stored for
posterior analysis.
The High Level Trigger process is subdivided into two stages:
High Level Trigger 1 (HLT1) and High Level Trigger 2 (HLT2).
HLT1 occurs in real-time, and yields a reduction of data of approximately
30:1. HLT1 consists in a series of software processes
that reconstruct particle collisions. The HLT1 reconstruction only
analyzes the trajectories of particles produced at the collision,
solving a problem known as track reconstruction, that determines
whether the collision data is kept or discarded. In contrast,
HLT2 is a finer process, which requires more time to execute
and reconstructs all subdetectors composing LHCb.
Towards 2020, the LHCb detector and all the components
composing the data acquisition system will be upgraded. As
part of the data acquisition system, the servers that process
HLT1 and HLT2 will also be upgraded. In addition, the LHC
accelerator will also be updated, increasing the data generated in
every bunch crossing by roughly 5 times. Due to the accelerator
and detector upgrades, the amount of data that the HLT will
require to process is expected to increase by 40 times.
The foreseen scalability of the software through 2020 underestimated
the required resources to face the increase in data
throughput. As a consequence, studies of all algorithms composing
HLT1 and HLT2 and code modernizations were carried
out, in order to obtain a better performance and increase the
processing capability of the foreseen hardware resources in the
upgrade.
In this thesis, several algorithms of the LHCb recontruction
are explored. The track reconstruction problem is analyzed
in depth, and new algorithms are proposed. Since the analyzed
problems are massively parallel, these algorithms are implemented
in specialized languages for modern graphics cards
(GPUs), due to their inherently parallel architecture. From this
work stem two algorithm designs. Furthermore, four additional
decoding algorithms and a clustering algorithms have been designed
and implemented, which are also part of HLT1. Apart
from that, an parallel Kalman filter algorithm has been designed
and implemented, which can be used in both HLT stages.
The developed algorithms satisfy the requirements of the
LHCb collaboration for the LHCb upgrade. In order to execute
the algorithms efficiently on GPUs, a software framework specialized
for GPUs is developed, which allows executing GPU
reconstruction sequences in parallel. Combining the developed
algorithms with the framework, an execution sequence is completed
as the foundations of a GPU HLT1.
During the research carried out in this thesis, the aforementioned
developments and a small group of collaborators coordinated
by the author lead to the completion of a full GPU
HLT1 sequence. The performance obtained on GPUs allows
executing a reconstruction sequence in real-time, under LHCb
upgrade conditions. The developed GPU HLT1 constitutes the
first GPU high level trigger ever developed for an LHC experiment.
Finally, various possible realizations of the GPU HLT1 to
integrate in a production GPU-equipped data acquisition system
are detailed