23 research outputs found

    Performance Improvements of EventIndex Distributed System at CERN

    Get PDF
    El trabajo de esta tesis se enmarca dentro del proyecto EventIndex del experimento ATLAS, un gran detector de partı́culas del LHC (Gran Colisionador de Hadrones) en el CERN. El objetivo del proyecto es catalogar todas las colisiones de partı́culas, o eventos, registrados en el detector ATLAS y también simulados a lo largo de sus años de funcionamiento. Con este catálogo se pueden caracterizar los datos a nivel de evento para su búsqueda y localización por parte de los usuarios finales. También se pueden realizar comprobaciones en la cadena de registro y reprocesado de los datos, para comprobar su corrección y optimizar futuros procesos. Debido al incremento en las tasas y volumen de datos esperados en el Run 3 (2022-2025) y el HL-LHC (finales de la década del 2020), se requiere un sistema escalable y que simplifique implementaciones anteriores. En esta tesis se presentan las contribuciones al proyecto en las áreas de recolección de datos distribuida, almacenamiento de cantidades masivas de datos y acceso a los mismos. Una pequeña cantidad de información (metadatos) por evento es indexada en el CERN (Tier-0), y de forma distribuida en el grid en todos los centros de computación que forman parte del experimento ATLAS (10 Tier-1, y del orden de 70 Tier-2). En esta tesis se presenta un nuevo modelo de recolección de datos en el grid basado en un object store como almacenamiento temporal, y con selección dinámica de datos para su ingestión en el almacén de datos final. También se presentan las contribuciones a una nueva solución en un único y gran almacén de datos basado en tecnologı́as de macrodatos (Big Data) como HBase/Phoenix, capaz de sostener las tasas y volumen de ingestión de datos requeridos, y que simplifica y soluciona los problemas de las anteriores soluciones hı́bridas. Finalmente, se presenta un marco de computación y herramientas basadas en Spark para el acceso a los datos y la resolución de cargas de trabajo analı́ticas que acceden a grandes cantidades de datos, como el cálculo del solapado (overlaps) entre eventos de distintos datasets, o el cálculo de eventos duplicados.The work presented in this thesis is framed in the context of the EventIndex project of the ATLAS experiment, a big particle detector of the LHC (Large Hadron Collider) at CERN. The objective of the project is to catalog all the particle collisions, or events, recorded at the ATLAS detector and also simulated over the duration of the experiment. With this catalog, data can be characterized at event granularity, important for searching and locating events by the end users. Other automatic checkings can be done in the data reprocessing chain, in order to assure its correctness and optimize future processings. Due to the rise in the production rates and total volume of the data expected for Run 3 (2022-2025) and the HL-LHC (end of the 2020 decade), a scalable system is required also to simplify previous implementations. In this thesis we present the contributions to the project in the areas of distributed data collection, storage of massive volumes of data and access to them. A small quantity of information (metadata) by event is collected from CERN (Tier-0), and distributedly worldwide in the grid in all the computing centers part of the ATLAS Experiment (10 Tier-1, and around 70 Tier-2). We present a new pull model for data collection in the grid with an object store as a temporary store, from where the data can be dynamically retrieved to be ingested at the final backend. We also present the contributions to a big data store using HBase/Phoenix, able to sustain the required data rates and total volume of data, and that simplifies the limitations of the previous hybrid solutions. Finally, we present a computing framework and tools using Spark for the data access, and solving the analytic use cases that access large amounts of data, such as overlaps or duplicate events detection

    Computable Analysis and Game Theory: From Foundations to Applications

    Get PDF
    This body of research showcases several facets of the intersection between computer science and game theory. On the foundational side, we explore the obstructions to the computability of Nash equilibria in the setting of computable analysis. In particular, we study the Weihrauch degree of the problem of finding a Nash equilibrium for a multiplayer game in normal form. We conclude that the Weihrauch degree Nash for multiplayer games lies between AoUC∗[0,1] and AoUC⋄[0,1] (Theorem 5.3). As a slight detour, we also explore the demarcation between computable and non-computable computational problems pertaining to the verification of machine learning. We demonstrate that many verification questions are computable without the need to specify a machine learning framework (Section 7.2). As well as looking into the theory of learners, robustness and sparisty of training data. On the application side, we study the use of Hypergames in Cybersecurity. We look into cybersecurity AND/OR attack graphs and how we could turn them into a hypergame (8.1). Hyper Nash equilibria is not an ideal solution for these games, however, we propose a regret-minimisation based solution concept. In Section 8.2, we survey the area of Hypergames and their connection to cybersecurity, showing that even if there is a small overlap, the reach is limited. We suggest new research directions such as adaptive games, generalisation and transferability (Section 8.3)

    FPGAs in Bioinformatics: Implementation and Evaluation of Common Bioinformatics Algorithms in Reconfigurable Logic

    Get PDF
    Life. Much effort is taken to grant humanity a little insight in this fascinating and complex but fundamental topic. In order to understand the relations and to derive consequences humans have begun to sequence their genomes, i.e. to determine their DNA sequences to infer information, e.g. related to genetic diseases. The process of DNA sequencing as well as subsequent analysis presents a computational challenge for recent computing systems due to the large amounts of data alone. Runtimes of more than one day for analysis of simple datasets are common, even if the process is already run on a CPU cluster. This thesis shows how this general problem in the area of bioinformatics can be tackled with reconfigurable hardware, especially FPGAs. Three compute intensive problems are highlighted: sequence alignment, SNP interaction analysis and genotype imputation. In the area of sequence alignment the software BLASTp for protein database searches is exemplarily presented, implemented and evaluated.SNP interaction analysis is presented with three applications performing an exhaustive search for interactions including the corresponding statistical tests: BOOST, iLOCi and the mutual information measurement. All applications are implemented in FPGA-hardware and evaluated, resulting in an impressive speedup of more than in three orders of magnitude when compared to standard computers. The last topic of genotype imputation presents a two-step process composed of the phasing step and the actual imputation step. The focus lies on the phasing step which is targeted by the SHAPEIT2 application. SHAPEIT2 is discussed with its underlying mathematical methods in detail, and finally implemented and evaluated. A remarkable speedup of 46 is reached here as well

    FPGAs in der Bioinformatik: Implementierung und Evaluierung bekannter bioinformatischer Algorithmen in rekonfigurierbarer Logik

    Get PDF
    Life. Much effort is taken to grant humanity a little insight in this fascinating and complex but fundamental topic. In order to understand the relations and to derive consequences humans have begun to sequence their genomes, i.e. to determine their DNA sequences to infer information, e.g. related to genetic diseases. The process of DNA sequencing as well as subsequent analysis presents a computational challenge for recent computing systems due to the large amounts of data alone. Runtimes of more than one day for analysis of simple datasets are common, even if the process is already run on a CPU cluster. This thesis shows how this general problem in the area of bioinformatics can be tackled with reconfigurable hardware, especially FPGAs. Three compute intensive problems are highlighted: sequence alignment, SNP interaction analysis and genotype imputation. In the area of sequence alignment the software BLASTp for protein database searches is exemplarily presented, implemented and evaluated. SNP interaction analysis is presented with three applications performing an exhaustive search for interactions including the corresponding statistical tests: BOOST, iLOCi and the mutual information measurement. All applications are implemented in FPGA-hardware and evaluated, resulting in an impressive speedup of more than in three orders of magnitude when compared to standard computers. The last topic of genotype imputation presents a two-step process composed of the phasing step and the actual imputation step. The focus lies on the phasing step which is targeted by the SHAPEIT2 application. SHAPEIT2 is discussed with its underlying mathematical methods in detail, and finally implemented and evaluated. A remarkable speedup of 46 is reached here as well.Das Leben. Sehr viel Aufwand wird getrieben um der Menschheit einen Einblick in dieses faszinierende und komplexe, aber fundamentale Thema zu erlauben. Um Zusammenhänge zu verstehen und Folgen ableiten zu können hat der Mensch begonnen sein Genom zu sequenzieren, d.h. seine DNA zu bestimmen um daraus Informationen, z.B. in Bezug auf Erbkrankheiten folgern zu können. Der Prozess der DNA-Sequenzierung sowie die darauffolgenden Analysen sind schon allein wegen der riesigen Datenmengen eine Herausforderung für aktuelle Rechensysteme. Laufzeiten von über einen Tag für die Analyse einfacher Datensätze sind üblich, selbst wenn der Prozess bereits auf einem Computercluster ausgeführt wird. Diese Arbeit zeigt, wie dieses gängige Problem im Bereich der Bioinformatik mit rekonfigurierbarer Hardware, speziell FPGAs, angegangen werden kann. Es werden drei rechenintensive Themengebiete hervorgehoben: Sequenzalignment, SNP-Interaktionsanalyse und Genotyp-Imputation. Beispielhaft wird im Bereich des Sequenzalignments die Software BLASTp für die Suche in Proteinsequenzdatenbanken vorgestellt, implementiert und evaluiert. Die SNP-Interaktionsanalyse wird mit drei Verfahren zur vollständigen Suche von Interaktionen inklusive des dazugehörigen statistischen Tests vorgestellt: BOOST, iLOCi und die Messung der Transinformation. Alle Verfahren werden auf FPGA-Hardware implementiert und evaluiert, mit einer bestechenden Beschleunigung im dreistelligen Bereich gegenüber Standard-Rechnern. Das letzte Gebiet der Genotyp-Imputierung ist ein zweiteiliges Verfahren bestehend aus dem Phasing und der eigentlichen Imputation. Der Schwerpunkt liegt im Phasing-Schritt, der mit dem SHAPEIT2-Tool adressiert wird. SHAPEIT2 wird ausführlich mit den zugrunde liegenden mathematischen Methoden diskutiert, und schließlich implementiert und evaluiert. Auch hier wird ein beachtlicher Speedup von 46 erreicht

    Visual Processing and Latent Representations in Biological and Artificial Neural Networks

    Get PDF
    The human visual system performs the impressive task of converting light arriving at the retina into a useful representation that allows us to make sense of the visual environment. We can navigate easily in the three-dimensional world and recognize objects and their properties, even if they appear from different angles and under different lighting conditions. Artificial systems can also perform well on a variety of complex visual tasks. While they may not be as robust and versatile as their biological counterpart, they have surprising capabilities that are rapidly improving. Studying the two types of systems can help us understand what computations enable the transformation of low-level sensory data into an abstract representation. To this end, this dissertation follows three different pathways. First, we analyze aspects of human perception. The focus is on the perception in the peripheral visual field and the relation to texture perception. Our work builds on a texture model that is based on the features of a deep neural network. We start by expanding the model to the temporal domain to capture dynamic textures such as flames or water. Next, we use psychophysical methods to investigate quantitatively whether humans can distinguish natural textures from samples that were generated by a texture model. Finally, we study images that cover the entire visual field and test whether matching the local summary statistics can produce metameric images independent of the image content. Second, we compare the visual perception of humans and machines. We conduct three case studies that focus on the capabilities of artificial neural networks and the potential occurrence of biological phenomena in machine vision. We find that comparative studies are not always straightforward and propose a checklist on how to improve the robustness of the conclusions that we draw from such studies. Third, we address a fundamental discrepancy between human and machine vision. One major strength of biological vision is its robustness to changes in the appearance of image content. For example, for unusual scenarios, such as a cow on a beach, the recognition performance of humans remains high. This ability is lacking in many artificial systems. We discuss on a conceptual level how to robustly disentangle attributes that are correlated during training, and test this on a number of datasets

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers

    Fundamental Approaches to Software Engineering

    Get PDF
    This open access book constitutes the proceedings of the 24th International Conference on Fundamental Approaches to Software Engineering, FASE 2021, which took place during March 27–April 1, 2021, and was held as part of the Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg but changed to an online format due to the COVID-19 pandemic. The 16 full papers presented in this volume were carefully reviewed and selected from 52 submissions. The book also contains 4 Test-Comp contributions
    corecore