35 research outputs found

    pAElla: Edge-AI based Real-Time Malware Detection in Data Centers

    Full text link
    The increasing use of Internet-of-Things (IoT) devices for monitoring a wide spectrum of applications, along with the challenges of "big data" streaming support they often require for data analysis, is nowadays pushing for an increased attention to the emerging edge computing paradigm. In particular, smart approaches to manage and analyze data directly on the network edge, are more and more investigated, and Artificial Intelligence (AI) powered edge computing is envisaged to be a promising direction. In this paper, we focus on Data Centers (DCs) and Supercomputers (SCs), where a new generation of high-resolution monitoring systems is being deployed, opening new opportunities for analysis like anomaly detection and security, but introducing new challenges for handling the vast amount of data it produces. In detail, we report on a novel lightweight and scalable approach to increase the security of DCs/SCs, that involves AI-powered edge computing on high-resolution power consumption. The method -- called pAElla -- targets real-time Malware Detection (MD), it runs on an out-of-band IoT-based monitoring system for DCs/SCs, and involves Power Spectral Density of power measurements, along with AutoEncoders. Results are promising, with an F1-score close to 1, and a False Alarm and Malware Miss rate close to 0%. We compare our method with State-of-the-Art MD techniques and show that, in the context of DCs/SCs, pAElla can cover a wider range of malware, significantly outperforming SoA approaches in terms of accuracy. Moreover, we propose a methodology for online training suitable for DCs/SCs in production, and release open dataset and code

    Energy Measurements of High Performance Computing Systems: From Instrumentation to Analysis

    Get PDF
    Energy efficiency is a major criterion for computing in general and High Performance Computing in particular. When optimizing for energy efficiency, it is essential to measure the underlying metric: energy consumption. To fully leverage energy measurements, their quality needs to be well-understood. To that end, this thesis provides a rigorous evaluation of various energy measurement techniques. I demonstrate how the deliberate selection of instrumentation points, sensors, and analog processing schemes can enhance the temporal and spatial resolution while preserving a well-known accuracy. Further, I evaluate a scalable energy measurement solution for production HPC systems and address its shortcomings. Such high-resolution and large-scale measurements present challenges regarding the management of large volumes of generated metric data. I address these challenges with a scalable infrastructure for collecting, storing, and analyzing metric data. With this infrastructure, I also introduce a novel persistent storage scheme for metric time series data, which allows efficient queries for aggregate timelines. To ensure that it satisfies the demanding requirements for scalable power measurements, I conduct an extensive performance evaluation and describe a productive deployment of the infrastructure. Finally, I describe different approaches and practical examples of analyses based on energy measurement data. In particular, I focus on the combination of energy measurements and application performance traces. However, interweaving fine-grained power recordings and application events requires accurately synchronized timestamps on both sides. To overcome this obstacle, I develop a resilient and automated technique for time synchronization, which utilizes crosscorrelation of a specifically influenced power measurement signal. Ultimately, this careful combination of sophisticated energy measurements and application performance traces yields a detailed insight into application and system energy efficiency at full-scale HPC systems and down to millisecond-range regions.:1 Introduction 2 Background and Related Work 2.1 Basic Concepts of Energy Measurements 2.1.1 Basics of Metrology 2.1.2 Measuring Voltage, Current, and Power 2.1.3 Measurement Signal Conditioning and Analog-to-Digital Conversion 2.2 Power Measurements for Computing Systems 2.2.1 Measuring Compute Nodes using External Power Meters 2.2.2 Custom Solutions for Measuring Compute Node Power 2.2.3 Measurement Solutions of System Integrators 2.2.4 CPU Energy Counters 2.2.5 Using Models to Determine Energy Consumption 2.3 Processing of Power Measurement Data 2.3.1 Time Series Databases 2.3.2 Data Center Monitoring Systems 2.4 Influences on the Energy Consumption of Computing Systems 2.4.1 Processor Power Consumption Breakdown 2.4.2 Energy-Efficient Hardware Configuration 2.5 HPC Performance and Energy Analysis 2.5.1 Performance Analysis Techniques 2.5.2 HPC Performance Analysis Tools 2.5.3 Combining Application and Power Measurements 2.6 Conclusion 3 Evaluating and Improving Energy Measurements 3.1 Description of the Systems Under Test 3.2 Instrumentation Points and Measurement Sensors 3.2.1 Analog Measurement at Voltage Regulators 3.2.2 Instrumentation with Hall Effect Transducers 3.2.3 Modular Instrumentation of DC Consumers 3.2.4 Optimal Wiring for Shunt-Based Measurements 3.2.5 Node-Level Instrumentation for HPC Systems 3.3 Analog Signal Conditioning and Analog-to-Digital Conversion 3.3.1 Signal Amplification 3.3.2 Analog Filtering and Analog-To-Digital Conversion 3.3.3 Integrated Solutions for High-Resolution Measurement 3.4 Accuracy Evaluation and Calibration 3.4.1 Synthetic Workloads for Evaluating Power Measurements 3.4.2 Improving and Evaluating the Accuracy of a Single-Node Measuring System 3.4.3 Absolute Accuracy Evaluation of a Many-Node Measuring System 3.5 Evaluating Temporal Granularity and Energy Correctness 3.5.1 Measurement Signal Bandwidth at Different Instrumentation Points 3.5.2 Retaining Energy Correctness During Digital Processing 3.6 Evaluating CPU Energy Counters 3.6.1 Energy Readouts with RAPL 3.6.2 Methodology 3.6.3 RAPL on Intel Sandy Bridge-EP 3.6.4 RAPL on Intel Haswell-EP and Skylake-SP 3.7 Conclusion 4 A Scalable Infrastructure for Processing Power Measurement Data 4.1 Requirements for Power Measurement Data Processing 4.2 Concepts and Implementation of Measurement Data Management 4.2.1 Message-Based Communication between Agents 4.2.2 Protocols 4.2.3 Application Programming Interfaces 4.2.4 Efficient Metric Time Series Storage and Retrieval 4.2.5 Hierarchical Timeline Aggregation 4.3 Performance Evaluation 4.3.1 Benchmark Hardware Specifications 4.3.2 Throughput in Symmetric Configuration with Replication 4.3.3 Throughput with Many Data Sources and Single Consumers 4.3.4 Temporary Storage in Message Queues 4.3.5 Persistent Metric Time Series Request Performance 4.3.6 Performance Comparison with Contemporary Time Series Storage Solutions 4.3.7 Practical Usage of MetricQ 4.4 Conclusion 5 Energy Efficiency Analysis 5.1 General Energy Efficiency Analysis Scenarios 5.1.1 Live Visualization of Power Measurements 5.1.2 Visualization of Long-Term Measurements 5.1.3 Integration in Application Performance Traces 5.1.4 Graphical Analysis of Application Power Traces 5.2 Correlating Power Measurements with Application Events 5.2.1 Challenges for Time Synchronization of Power Measurements 5.2.2 Reliable Automatic Time Synchronization with Correlation Sequences 5.2.3 Creating a Correlation Signal on a Power Measurement Channel 5.2.4 Processing the Correlation Signal and Measured Power Values 5.2.5 Common Oversampling of the Correlation Signals at Different Rates 5.2.6 Evaluation of Correlation and Time Synchronization 5.3 Use Cases for Application Power Traces 5.3.1 Analyzing Complex Power Anomalies 5.3.2 Quantifying C-State Transitions 5.3.3 Measuring the Dynamic Power Consumption of HPC Applications 5.4 Conclusion 6 Summary and Outloo

    Passive Delay Measurement for Fidelity Monitoring of Distributed Network Emulation

    Get PDF
    International audienceEmulation has become a popular approach for the validation and evaluation of network research. It provides researchers with a contained, customizable, and scalable testing environment, which can be easily packaged and published for potential readers to reproduce their results. However, as the network components are only virtual, emulation lacks the inherent realism of physical testbeds. In light of this, monitoring specific metrics of the emulated network has been proposed as a solution to mitigate to some degree inaccuracies caused by emulation. While this is not difficult to implement in a single-machine setting (e.g. with Mininet), monitoring is limited by the lack of time synchronization in scenarios where the emulation is distributed over multiple physical machines (e.g., Distrinet). In this paper we tackle the case of packet delay monitoring, to which we propose a methodology for passively measuring one-way delays with underlying assumptions about time synchronization, and round-trip delays otherwise. For an efficient implementation of our methodology, we propose an eBPF-based packet measurement tool that performs better thancurrent packet sniffers under emulation-specific assumptions. We implement and evaluate our system in an open testbed and show that it can reach results within few microseconds of perfect accuracy and precision

    Unified Synchronized Data Acquisition Networks

    Full text link
    The permanently evolving technical area of communication technology and the presence of more and more precise sensors and detectors, enable options and solutions to challenges in science and industry. In high-energy physics, for example, it becomes possible with accurate measurements to observe particles almost at the speed of light in small-sized dimensions. Thereby, the enormous amounts of gathered data require modern high performance communication networks. Potential and efficient implementation of future readout chains will depend on new concepts and mechanisms. The main goals of this dissertation are to create new efficient synchronization mechanisms and to evolve readout systems for optimization of future sensor and detector systems. This happens in the context of the Compressed Baryonic Matter experiment, which is a part of the Facility for Antiproton and Ion Research, an international accelerator facility. It extends an accelerator complex in Darmstadt at the GSI Helmholtzzentrum für Schwerionenforschung GmbH. Initially, the challenges are specified and an analysis of the state of the art is presented. The resulting constraints and requirements influenced the design and development described within this dissertation. Subsequently, the different design and implementation tasks are discussed. Starting with the basic detector read system requirements and the definition of an efficient communication protocol. This protocol delivers all features needed for building of compact and efficient readout systems. Therefore, it is advantageous to use a single unified connection for processing all communication traffic. This means not only data, control, and synchronization messages, but also clock distribution is handled. Furthermore, all links in this system have a deterministic latency. The deterministic behavior enables establishing a synchronous network. Emerging problems were solved and the concept was successfully implemented and tested during several test beam times. In addition, the implementation and integration of this communication methodology into different network devices is described. Therefore, a generic modular approach was created. This enhances ASIC development by supporting them with proven hardware IPs, reducing design time, and risk of failure. Furthermore, this approach delivers flexibility concerning data rate and structure for the network system. Additionally, the design and prototyping for a data aggregation and concentrator ASIC is described. In conjunction with a dense electrical to optical conversion, this ASIC enables communication with flexible readout structures for the experiment and delivers the planned capacities and bandwidth. In the last part of the work, analysis and transfer of the created innovative synchronization mechanism into the area of high performance computing is discussed. Finally, a conclusion of all reached results and an outlook of possible future activities and research tasks within the Compressed Baryonic Matter experiment are presented

    Adaptive Asynchronous Control and Consistency in Distributed Data Exploration Systems

    Get PDF
    Advances in machine learning and streaming systems provide a backbone to transform vast arrays of raw data into valuable information. Leveraging distributed execution, analysis engines can process this information effectively within an iterative data exploration workflow to solve problems at unprecedented rates. However, with increased input dimensionality, a desire to simultaneously share and isolate information, as well as overlapping and dependent tasks, this process is becoming increasingly difficult to maintain. User interaction derails exploratory progress due to manual oversight on lower level tasks such as tuning parameters, adjusting filters, and monitoring queries. We identify human-in-the-loop management of data generation and distributed analysis as an inhibiting problem precluding efficient online, iterative data exploration which causes delays in knowledge discovery and decision making. The flexible and scalable systems implementing the exploration workflow require semi-autonomous methods integrated as architectural support to reduce human involvement. We, thus, argue that an abstraction layer providing adaptive asynchronous control and consistency management over a series of individual tasks coordinated to achieve a global objective can significantly improve data exploration effectiveness and efficiency. This thesis introduces methodologies which autonomously coordinate distributed execution at a lower level in order to synchronize multiple efforts as part of a common goal. We demonstrate the impact on data exploration through serverless simulation ensemble management and multi-model machine learning by showing improved performance and reduced resource utilization enabling a more productive semi-autonomous exploration workflow. We focus on the specific genres of molecular dynamics and personalized healthcare, however, the contributions are applicable to a wide variety of domains

    Nuclear Power Plants

    Get PDF
    This book covers various topics, from thermal-hydraulic analysis to the safety analysis of nuclear power plant. It does not focus only on current power plant issues. Instead, it aims to address the challenging ideas that can be implemented in and used for the development of future nuclear power plants. This book will take the readers into the world of innovative research and development of future plants. Find your interests inside this book

    Faculty Publications & Presentations, 2005-2006

    Get PDF
    corecore