145 research outputs found
A variational autoencoder application for real-time anomaly detection at CMS
Despite providing invaluable data in the field of High Energy Physics, towards higher luminosity runs the Large Hadron Collider (LHC) will face challenges in discovering interesting results through conventional methods used in previous run periods.
Among the proposed approaches, the one we focus on in this thesis work – in collaboration with CERN teams, involves the use of a joint variational autoencoder (JointVAE) machine learning model, trained on known physics processes to identify anomalous events that correspond to previously unidentified physics signatures.
By doing so, this method does not rely on any specific new physics signatures and can detect anomalous events in an unsupervised manner, complementing the traditional LHC search tactics that rely on model-dependent hypothesis testing.
The algorithm produces a list of anomalous events, which experimental collaborations will examine and eventually confirm as new physics phenomena.
Furthermore, repetitive event topologies in the dataset can inspire new physics model building and experimental searches.
Implementing this algorithm in the trigger system of LHC experiments can detect previously unnoticed anomalous events, thus broadening the discovery potential of the LHC.
This thesis presents a method for implementing the JointVAE model, for real-time anomaly detection in the Compact Muon Solenoid (CMS) experiment.
Among the challenges of implementing machine learning models in fast applications, such as the trigger system of the LHC experiments, low latency and reduced resource consumption are essential.
Therefore, the JointVAE model has been studied for its implementation feasibility in Field-Programmable Gate Arrays (FPGAs), utilizing a tool based on High-Level Synthesis (HLS) named HLS4ML.
The tool, combined with the quantization of neural networks, will reduce the model size, latency, and energy consumption
Jornadas Nacionales de Investigación en Ciberseguridad: actas de las VIII Jornadas Nacionales de Investigación en ciberseguridad: Vigo, 21 a 23 de junio de 2023
Jornadas Nacionales de Investigación en Ciberseguridad (8ª. 2023. Vigo)atlanTTicAMTEGA: Axencia para a modernización tecnolóxica de GaliciaINCIBE: Instituto Nacional de Cibersegurida
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Energy Aware Runtime Systems for Elastic Stream Processing Platforms
Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of “on which processing element to map a specific computational unit?”, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en oföränderlig tillväxt i prestandakrav hos processorer, började den flerkärniga processor-revolutionen för ungefär 20 år sedan. Denna revolution skedde till största del som en lösning till begränsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt höjdes i en-kärniga processorer. Den flerkärniga processor-revolutionen medförde inte enbart utmaningen gällande parallellprogrammering, m.a.o. förmågan att utveckla mjukvara som använder sig av alla delelement i de flerkärniga processorerna, men också utmaningen med programmering av heterogena plattformar. Frågeställningen ”på vilken processorelement skall en viss beräkning utföras?” är väl känt inom ramen för inbyggda datorsystem. Efter introduktionen av grafikprocessorer för allmänna beräkningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkärniga processorer på olika system-on-chip plattformar, är heterogena parallella plattformar idag omfattande inom många domäner, från konsumtionsartiklar till mediaprocesseringsplattformar för telekommunikationsoperatörer. Processen att placera beräkningarna på en passande hårdvaruplattform kallas för utforskning av en designrymd (design-space exploration). Denna process är mycket utmanande för heterogena flerkärniga arkitekturer, och kan medföra fördelar när det gäller energieffektivitet. Det största problemet är att de olika valmöjligheterna i designrymden kan växa exponentiellt. Enligt den nuvarande trenden som förespår ökad heterogeniska aspekter i processorerna är utmaningen att hitta den mest passande placeringen av beräkningarna på hårdvaran ännu en forskningsfråga inom ramen för inbyggda datorsystem. Till exempel, den nuvarande schemaläggaren i Linux operativsystemet är inkapabel att hitta en effektiv placering av beräkningarna på den underliggande hårdvaran. Det enda mätsättet som används är processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. Användning av detta mätsätt vid resursallokering resulterar i slöseri med energi. Denna forskning härstammar från observationerna att dessa tillvägagångssätt inte stöder det dynamiska beteendet hos ström-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid körtid. Denna forskning kontribuerar till det allmänna målet att utveckla energieffektiva lösningar för ström-applikationer (streaming applications) på heterogena flerkärniga hårdvaruplattformar. Ström-applikationer är numera mycket vanliga i mjukvarudomän. Deras distinkta karaktär är inläsning av flertalet dataströmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understöder utvecklingen av nya sätt för att lösa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus på energi- och prestandahantering
Recommended from our members
Computational Methods in Multi-Messenger Astrophysics using Gravitational Waves and High Energy Neutrinos
This dissertation seeks to describe advancements made in computational methods for multi-messenger astrophysics (MMA) using gravitational waves GW and neutrinos during Advanced LIGO (aLIGO)’s first through third observing runs (O1-O3) and, looking forward, to describe novel computational techniques suited to the challenges of both the burgeoning MMA field and high-performance computing as a whole.
The first two chapters provide an overview of MMA as it pertains to gravitational wave/high energy neutrino (GWHEN) searches, including a summary of expected astrophysical sources as well as GW, neutrino, and gamma-ray detectors used in their detection. These are followed in the third chapter by an in-depth discussion of LIGO’s timing system, particularly the diagnostic subsystem, describing both its role in MMA searches and the author’s contributions to the system itself.
The fourth chapter provides a detailed description of the Low-Latency Algorithm for Multi-messenger Astrophysics (LLAMA), the GWHEN pipeline developed by the author and used in O2 and O3. Relevant past multi-messenger searches are described first, followed by the O2 and O3 analysis methods, the pipeline’s performance, scientific results, and finally, an in-depth account of the library’s structure and functionality. In particular, the author’s high-performance multi-order coordinates (MOC) HEALPix image analysis library, HPMOC, is described. HPMOC increases performance of HEALPix image manipulations by several orders of magnitude vs. naive single-resolution approaches while presenting a simple high-level interface and should prove useful for diverse future MMA searches. The performance improvements it provides for LLAMA are also covered.
The final chapter of this dissertation builds on the approaches taken in developing HPMOC, presenting several novel methods for efficiently storing and analyzing large data sets, with applications to MMA and other data-intensive fields. A family of depth-first multi-resolution ordering of HEALPix images — DEPTH9, DEPTH19, and DEPTH40 — is defined, along with algorithms and use cases where it can improve on current approaches, including high-speed streaming calculations suitable for serverless compute or FPGAs.
For performance-constrained analyses on HEALPix data (e.g. image analysis in multi-messenger search pipelines) using SIMD processors, breadth-first data structures can provide short-circuiting calculations in a data-parallel way on compressed data; a simple compression method is described with application to further improving LLAMA performance.
A new storage scheme and associated algorithms for efficiently compressing and contracting tensors of varying sparsity is presented; these demuxed tensors (D-Tensors) have equivalent asymptotic time and space complexity to optimal representations of both dense and sparse matrices, and could be used as a universal drop-in replacement to reduce code complexity and developer effort while improving performance of existing non-optimized numerical code. Finally, the big bucket hash table (B-Table), a novel type of hash table making guarantees on data layout (vs. load factor), is described, along with optimizations it allows for (like hardware acceleration, online rebuilds, and hard realtime applications) that are not possible with existing hash table approaches. These innovations are presented in the hope that some will prove useful for improving future MMA searches and other data-intensive applications
Technologies and Applications for Big Data Value
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems
Circuits and Systems Advances in Near Threshold Computing
Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing
Power Electronics Applications in Renewable Energy Systems
The renewable generation system is currently experiencing rapid growth in various power grids. The stability and dynamic response issues of power grids are receiving attention due to the increase in power electronics-based renewable energy. The main focus of this Special Issue is to provide solutions for power system planning and operation. Power electronics-based devices can offer new ancillary services to several industrial sectors. In order to fully include the capability of power conversion systems in the network integration of renewable generators, several studies should be carried out, including detailed studies of switching circuits, and comprehensive operating strategies for numerous devices, consisting of large-scale renewable generation clusters
Machine Learning Methods with Noisy, Incomplete or Small Datasets
In many machine learning applications, available datasets are sometimes incomplete, noisy or affected by artifacts. In supervised scenarios, it could happen that label information has low quality, which might include unbalanced training sets, noisy labels and other problems. Moreover, in practice, it is very common that available data samples are not enough to derive useful supervised or unsupervised classifiers. All these issues are commonly referred to as the low-quality data problem. This book collects novel contributions on machine learning methods for low-quality datasets, to contribute to the dissemination of new ideas to solve this challenging problem, and to provide clear examples of application in real scenarios
Inverse Dynamics Problems
The inverse dynamics problem was developed in order to provide researchers with the state of the art in inverse problems for dynamic and vibrational systems. Contrasted with a forward problem, which solves for the system output in a straightforward manner, an inverse problem searches for the system input through a procedure contaminated with errors and uncertainties. An inverse problem, with a focus on structural dynamics, determines the changes made to the system and estimates the inputs, including forces and moments, to the system, utilizing measurements of structural vibration responses only. With its complex mathematical structure and need for more reliable input estimations, the inverse problem is still a fundamental subject of research among mathematicians and engineering scientists. This book contains 11 articles that touch upon various aspects of inverse dynamic problems
- …