8,568 research outputs found

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    A Systematic Approach to Constructing Incremental Topology Control Algorithms Using Graph Transformation

    Full text link
    Communication networks form the backbone of our society. Topology control algorithms optimize the topology of such communication networks. Due to the importance of communication networks, a topology control algorithm should guarantee certain required consistency properties (e.g., connectivity of the topology), while achieving desired optimization properties (e.g., a bounded number of neighbors). Real-world topologies are dynamic (e.g., because nodes join, leave, or move within the network), which requires topology control algorithms to operate in an incremental way, i.e., based on the recently introduced modifications of a topology. Visual programming and specification languages are a proven means for specifying the structure as well as consistency and optimization properties of topologies. In this paper, we present a novel methodology, based on a visual graph transformation and graph constraint language, for developing incremental topology control algorithms that are guaranteed to fulfill a set of specified consistency and optimization constraints. More specifically, we model the possible modifications of a topology control algorithm and the environment using graph transformation rules, and we describe consistency and optimization properties using graph constraints. On this basis, we apply and extend a well-known constructive approach to derive refined graph transformation rules that preserve these graph constraints. We apply our methodology to re-engineer an established topology control algorithm, kTC, and evaluate it in a network simulation study to show the practical applicability of our approachComment: This document corresponds to the accepted manuscript of the referenced journal articl

    Machine Hyperconsciousness

    Get PDF
    Individual animal consciousness appears limited to a single giant component of interacting cognitive modules, instantiating a shifting, highly tunable, Global Workspace. Human institutions, by contrast, can support several, often many, such giant components simultaneously, although they generally function far more slowly than the minds of the individuals who compose them. Machines having multiple global workspaces -- hyperconscious machines -- should, however, be able to operate at the few hundred milliseconds characteistic of individual consciousness. Such multitasking -- machine or institutional -- while clearly limiting the phenomenon of inattentional blindness, does not eliminate it, and introduces characteristic failure modes involving the distortion of information sent between global workspaces. This suggests that machines explicitly designed along these principles, while highly efficient at certain sets of tasks, remain subject to canonical and idiosyncratic failure patterns analogous to, but more complicated than, those explored in Wallace (2006a). By contrast, institutions, facing similar challenges, are usually deeply embedded in a highly stabilizing cultural matrix of law, custom, and tradition which has evolved over many centuries. Parallel development of analogous engineering strategies, directed toward ensuring an 'ethical' device, would seem requisite to the sucessful application of any form of hyperconscious machine technology

    Combining behavioural types with security analysis

    Get PDF
    Today's software systems are highly distributed and interconnected, and they increasingly rely on communication to achieve their goals; due to their societal importance, security and trustworthiness are crucial aspects for the correctness of these systems. Behavioural types, which extend data types by describing also the structured behaviour of programs, are a widely studied approach to the enforcement of correctness properties in communicating systems. This paper offers a unified overview of proposals based on behavioural types which are aimed at the analysis of security properties

    A Cut Principle for Information Flow

    Full text link
    We view a distributed system as a graph of active locations with unidirectional channels between them, through which they pass messages. In this context, the graph structure of a system constrains the propagation of information through it. Suppose a set of channels is a cut set between an information source and a potential sink. We prove that, if there is no disclosure from the source to the cut set, then there can be no disclosure to the sink. We introduce a new formalization of partial disclosure, called *blur operators*, and show that the same cut property is preserved for disclosure to within a blur operator. This cut-blur property also implies a compositional principle, which ensures limited disclosure for a class of systems that differ only beyond the cut.Comment: 31 page

    Biologically inspired distributed machine cognition: a new formal approach to hyperparallel computation

    Get PDF
    The irresistable march toward multiple-core chip technology presents currently intractable pdrogramming challenges. High level mental processes in many animals, and their analogs for social structures, appear similarly massively parallel, and recent mathematical models addressing them may be adaptable to the multi-core programming problem

    Energy Measurements of High Performance Computing Systems: From Instrumentation to Analysis

    Get PDF
    Energy efficiency is a major criterion for computing in general and High Performance Computing in particular. When optimizing for energy efficiency, it is essential to measure the underlying metric: energy consumption. To fully leverage energy measurements, their quality needs to be well-understood. To that end, this thesis provides a rigorous evaluation of various energy measurement techniques. I demonstrate how the deliberate selection of instrumentation points, sensors, and analog processing schemes can enhance the temporal and spatial resolution while preserving a well-known accuracy. Further, I evaluate a scalable energy measurement solution for production HPC systems and address its shortcomings. Such high-resolution and large-scale measurements present challenges regarding the management of large volumes of generated metric data. I address these challenges with a scalable infrastructure for collecting, storing, and analyzing metric data. With this infrastructure, I also introduce a novel persistent storage scheme for metric time series data, which allows efficient queries for aggregate timelines. To ensure that it satisfies the demanding requirements for scalable power measurements, I conduct an extensive performance evaluation and describe a productive deployment of the infrastructure. Finally, I describe different approaches and practical examples of analyses based on energy measurement data. In particular, I focus on the combination of energy measurements and application performance traces. However, interweaving fine-grained power recordings and application events requires accurately synchronized timestamps on both sides. To overcome this obstacle, I develop a resilient and automated technique for time synchronization, which utilizes crosscorrelation of a specifically influenced power measurement signal. Ultimately, this careful combination of sophisticated energy measurements and application performance traces yields a detailed insight into application and system energy efficiency at full-scale HPC systems and down to millisecond-range regions.:1 Introduction 2 Background and Related Work 2.1 Basic Concepts of Energy Measurements 2.1.1 Basics of Metrology 2.1.2 Measuring Voltage, Current, and Power 2.1.3 Measurement Signal Conditioning and Analog-to-Digital Conversion 2.2 Power Measurements for Computing Systems 2.2.1 Measuring Compute Nodes using External Power Meters 2.2.2 Custom Solutions for Measuring Compute Node Power 2.2.3 Measurement Solutions of System Integrators 2.2.4 CPU Energy Counters 2.2.5 Using Models to Determine Energy Consumption 2.3 Processing of Power Measurement Data 2.3.1 Time Series Databases 2.3.2 Data Center Monitoring Systems 2.4 Influences on the Energy Consumption of Computing Systems 2.4.1 Processor Power Consumption Breakdown 2.4.2 Energy-Efficient Hardware Configuration 2.5 HPC Performance and Energy Analysis 2.5.1 Performance Analysis Techniques 2.5.2 HPC Performance Analysis Tools 2.5.3 Combining Application and Power Measurements 2.6 Conclusion 3 Evaluating and Improving Energy Measurements 3.1 Description of the Systems Under Test 3.2 Instrumentation Points and Measurement Sensors 3.2.1 Analog Measurement at Voltage Regulators 3.2.2 Instrumentation with Hall Effect Transducers 3.2.3 Modular Instrumentation of DC Consumers 3.2.4 Optimal Wiring for Shunt-Based Measurements 3.2.5 Node-Level Instrumentation for HPC Systems 3.3 Analog Signal Conditioning and Analog-to-Digital Conversion 3.3.1 Signal Amplification 3.3.2 Analog Filtering and Analog-To-Digital Conversion 3.3.3 Integrated Solutions for High-Resolution Measurement 3.4 Accuracy Evaluation and Calibration 3.4.1 Synthetic Workloads for Evaluating Power Measurements 3.4.2 Improving and Evaluating the Accuracy of a Single-Node Measuring System 3.4.3 Absolute Accuracy Evaluation of a Many-Node Measuring System 3.5 Evaluating Temporal Granularity and Energy Correctness 3.5.1 Measurement Signal Bandwidth at Different Instrumentation Points 3.5.2 Retaining Energy Correctness During Digital Processing 3.6 Evaluating CPU Energy Counters 3.6.1 Energy Readouts with RAPL 3.6.2 Methodology 3.6.3 RAPL on Intel Sandy Bridge-EP 3.6.4 RAPL on Intel Haswell-EP and Skylake-SP 3.7 Conclusion 4 A Scalable Infrastructure for Processing Power Measurement Data 4.1 Requirements for Power Measurement Data Processing 4.2 Concepts and Implementation of Measurement Data Management 4.2.1 Message-Based Communication between Agents 4.2.2 Protocols 4.2.3 Application Programming Interfaces 4.2.4 Efficient Metric Time Series Storage and Retrieval 4.2.5 Hierarchical Timeline Aggregation 4.3 Performance Evaluation 4.3.1 Benchmark Hardware Specifications 4.3.2 Throughput in Symmetric Configuration with Replication 4.3.3 Throughput with Many Data Sources and Single Consumers 4.3.4 Temporary Storage in Message Queues 4.3.5 Persistent Metric Time Series Request Performance 4.3.6 Performance Comparison with Contemporary Time Series Storage Solutions 4.3.7 Practical Usage of MetricQ 4.4 Conclusion 5 Energy Efficiency Analysis 5.1 General Energy Efficiency Analysis Scenarios 5.1.1 Live Visualization of Power Measurements 5.1.2 Visualization of Long-Term Measurements 5.1.3 Integration in Application Performance Traces 5.1.4 Graphical Analysis of Application Power Traces 5.2 Correlating Power Measurements with Application Events 5.2.1 Challenges for Time Synchronization of Power Measurements 5.2.2 Reliable Automatic Time Synchronization with Correlation Sequences 5.2.3 Creating a Correlation Signal on a Power Measurement Channel 5.2.4 Processing the Correlation Signal and Measured Power Values 5.2.5 Common Oversampling of the Correlation Signals at Different Rates 5.2.6 Evaluation of Correlation and Time Synchronization 5.3 Use Cases for Application Power Traces 5.3.1 Analyzing Complex Power Anomalies 5.3.2 Quantifying C-State Transitions 5.3.3 Measuring the Dynamic Power Consumption of HPC Applications 5.4 Conclusion 6 Summary and Outloo
    • …
    corecore