1,546 research outputs found

    Aqpet — An R package for air quality policy evaluation

    Get PDF
    Evaluating the effectiveness of clean air policies is important in the cycle of air quality management, ensuring policy accountability and informing future policy-making processes. However, such evaluations are challenging due to complex confounding factors such as varying weather conditions or seasonal or annual changes in air quality unrelated to the policy implementation. To address this challenge, we developed 'aqpet', a R package designed to streamline the quantification of policy effects on air quality using observational data. The package 'aqpet' includes: (1) automated-machine learning to predict air pollutants under average weather conditions – a process term as "weather normalisation"; (2) augmented synthetic control method (ASCM) to quantify the actual policy impact on air pollution. 'aqpet' offers functions for data collection and preparation, building auto-machine learning models, conducting weather normalisation, model performance evaluation and explanation, and causal impact analysis using ASCM. 'aqpet' enables fast, efficient, and interactive policy analysis for air quality management.</p

    Monitoring and analysis system for performance troubleshooting in data centers

    Get PDF
    It was not long ago. On Christmas Eve 2012, a war of troubleshooting began in Amazon data centers. It started at 12:24 PM, with an mistaken deletion of the state data of Amazon Elastic Load Balancing Service (ELB for short), which was not realized at that time. The mistake first led to a local issue that a small number of ELB service APIs were affected. In about six minutes, it evolved into a critical one that EC2 customers were significantly affected. One example was that Netflix, which was using hundreds of Amazon ELB services, was experiencing an extensive streaming service outage when many customers could not watch TV shows or movies on Christmas Eve. It took Amazon engineers 5 hours 42 minutes to find the root cause, the mistaken deletion, and another 15 hours and 32 minutes to fully recover the ELB service. The war ended at 8:15 AM the next day and brought the performance troubleshooting in data centers to world’s attention. As shown in this Amazon ELB case.Troubleshooting runtime performance issues is crucial in time-sensitive multi-tier cloud services because of their stringent end-to-end timing requirements, but it is also notoriously difficult and time consuming. To address the troubleshooting challenge, this dissertation proposes VScope, a flexible monitoring and analysis system for online troubleshooting in data centers. VScope provides primitive operations which data center operators can use to troubleshoot various performance issues. Each operation is essentially a series of monitoring and analysis functions executed on an overlay network. We design a novel software architecture for VScope so that the overlay networks can be generated, executed and terminated automatically, on-demand. From the troubleshooting side, we design novel anomaly detection algorithms and implement them in VScope. By running anomaly detection algorithms in VScope, data center operators are notified when performance anomalies happen. We also design a graph-based guidance approach, called VFocus, which tracks the interactions among hardware and software components in data centers. VFocus provides primitive operations by which operators can analyze the interactions to find out which components are relevant to the performance issue. VScope’s capabilities and performance are evaluated on a testbed with over 1000 virtual machines (VMs). Experimental results show that the VScope runtime negligibly perturbs system and application performance, and requires mere seconds to deploy monitoring and analytics functions on over 1000 nodes. This demonstrates VScope’s ability to support fast operation and online queries against a comprehensive set of application to system/platform level metrics, and a variety of representative analytics functions. When supporting algorithms with high computation complexity, VScope serves as a ‘thin layer’ that occupies no more than 5% of their total latency. Further, by using VFocus, VScope can locate problematic VMs that cannot be found via solely application-level monitoring, and in one of the use cases explored in the dissertation, it operates with levels of perturbation of over 400% less than what is seen for brute-force and most sampling-based approaches. We also validate VFocus with real-world data center traces. The experimental results show that VFocus has troubleshooting accuracy of 83% on average.Ph.D

    Data Mining; A Conceptual Overview

    Get PDF
    This tutorial provides an overview of the data mining process. The tutorial also provides a basic understanding of how to plan, evaluate and successfully refine a data mining project, particularly in terms of model building and model evaluation. Methodological considerations are discussed and illustrated. After explaining the nature of data mining and its importance in business, the tutorial describes the underlying machine learning and statistical techniques involved. It describes the CRISP-DM standard now being used in industry as the standard for a technology-neutral data mining process model. The paper concludes with a major illustration of the data mining process methodology and the unsolved problems that offer opportunities for research. The approach is both practical and conceptually sound in order to be useful to both academics and practitioners

    Working Notes from the 1992 AAAI Workshop on Automating Software Design. Theme: Domain Specific Software Design

    Get PDF
    The goal of this workshop is to identify different architectural approaches to building domain-specific software design systems and to explore issues unique to domain-specific (vs. general-purpose) software design. Some general issues that cut across the particular software design domain include: (1) knowledge representation, acquisition, and maintenance; (2) specialized software design techniques; and (3) user interaction and user interface

    Machine Learning for Resource-Constrained Computing Systems

    Get PDF
    Die verfĂŒgbaren Ressourcen in Informationsverarbeitungssystemen wie Prozessoren sind in der Regel eingeschrĂ€nkt. Das umfasst z. B. die elektrische Leistungsaufnahme, den Energieverbrauch, die WĂ€rmeabgabe oder die ChipflĂ€che. Daher ist die Optimierung der Verwaltung der verfĂŒgbaren Ressourcen von grĂ¶ĂŸter Bedeutung, um Ziele wie maximale Performanz zu erreichen. Insbesondere die Ressourcenverwaltung auf der Systemebene hat ĂŒber die (dynamische) Zuweisung von Anwendungen zu Prozessorkernen und ĂŒber die Skalierung der Spannung und Frequenz (dynamic voltage and frequency scaling, DVFS) einen großen Einfluss auf die Performanz, die elektrische Leistung und die Temperatur wĂ€hrend der AusfĂŒhrung von Anwendungen. Die wichtigsten Herausforderungen bei der Ressourcenverwaltung sind die hohe KomplexitĂ€t von Anwendungen und Plattformen, unvorhergesehene (zur Entwurfszeit nicht bekannte) Anwendungen oder Plattformkonfigurationen, proaktive Optimierung und die Minimierung des Laufzeit-Overheads. Bestehende Techniken, die auf einfachen Heuristiken oder analytischen Modellen basieren, gehen diese Herausforderungen nur unzureichend an. Aus diesem Grund ist der Hauptbeitrag dieser Dissertation der Einsatz maschinellen Lernens (ML) fĂŒr Ressourcenverwaltung. ML-basierte Lösungen ermöglichen die BewĂ€ltigung dieser Herausforderungen durch die Vorhersage der Auswirkungen potenzieller Entscheidungen in der Ressourcenverwaltung, durch SchĂ€tzung verborgener (unbeobachtbarer) Eigenschaften von Anwendungen oder durch direktes Lernen einer Ressourcenverwaltungs-Strategie. Diese Dissertation entwickelt mehrere neuartige ML-basierte Ressourcenverwaltung-Techniken fĂŒr verschiedene Plattformen, Ziele und Randbedingungen. ZunĂ€chst wird eine auf Vorhersagen basierende Technik zur Maximierung der Performanz von Mehrkernprozessoren mit verteiltem Last-Level Cache und limitierter Maximaltemperatur vorgestellt. Diese verwendet ein neuronales Netzwerk (NN) zur Vorhersage der Auswirkungen potenzieller Migrationen von Anwendungen zwischen Prozessorkernen auf die Performanz. Diese Vorhersagen erlauben die Bestimmung der bestmöglichen Migration und ermöglichen eine proaktive Verwaltung. Das NN ist so trainiert, dass es mit unbekannten Anwendungen und verschiedenen Temperaturlimits zurechtkommt. Zweitens wird ein Boosting-Verfahren zur Maximierung der Performanz homogener Mehrkernprozessoren mit limitierter Maximaltemperatur mithilfe von DVFS vorgestellt. Dieses basiert auf einer neuartigen {Boostability}-Metrik, die die AbhĂ€ngigkeiten von Performanz, elektrischer Leistung und Temperatur auf Spannungs/Frequenz-Änderungen in einer Metrik vereint. % ignorerepeated Die AbhĂ€ngigkeiten von Performanz und elektrischer Leistung hĂ€ngen von der Anwendung ab und können zur Laufzeit nicht direkt beobachtet (gemessen) werden. Daher wird ein NN verwendet, um diese Werte fĂŒr unbekannte Anwendungen zu schĂ€tzen und so die KomplexitĂ€t der Boosting-Optimierung zu bewĂ€ltigen. Drittens wird eine Technik zur Temperaturminimierung von heterogenen Mehrkernprozessoren mit Quality of Service-Zielen vorgestellt. Diese verwendet Imitationslernen, um eine Migrationsstrategie von Anwendungen aus optimalen Orakel-Demonstrationen zu lernen. DafĂŒr wird ein NN eingesetzt, um die KomplexitĂ€t der Plattform und des Anwendungsverhaltens zu bewĂ€ltigen. Die Inferenz des NNs wird mit Hilfe eines vorhandenen generischen Beschleunigers, einer Neural Processing Unit (NPU), beschleunigt. Auch die ML Algorithmen selbst mĂŒssen auch mit begrenzten Ressourcen ausgefĂŒhrt werden. Zuletzt wird eine Technik fĂŒr ressourcenorientiertes Training auf verteilten GerĂ€ten vorgestellt, um einen konstanten Trainingsdurchsatz bei sich schnell Ă€ndernder VerfĂŒgbarkeit von Rechenressourcen aufrechtzuerhalten, wie es z.~B.~aufgrund von Konflikten bei gemeinsam genutzten Ressourcen der Fall ist. Diese Technik verwendet Structured Dropout, welches beim Training zufĂ€llige Teile des NNs auslĂ€sst. Dadurch können die erforderlichen Ressourcen fĂŒr das Training dynamisch angepasst werden -- mit vernachlĂ€ssigbarem Overhead, aber auf Kosten einer langsameren Trainingskonvergenz. Die Pareto-optimalen Dropout-Parameter pro Schicht des NNs werden durch eine Design Space Exploration bestimmt. Evaluierungen dieser Techniken werden sowohl in Simulationen als auch auf realer Hardware durchgefĂŒhrt und zeigen signifikante Verbesserungen gegenĂŒber dem Stand der Technik, bei vernachlĂ€ssigbarem Laufzeit-Overhead. Zusammenfassend zeigt diese Dissertation, dass ML eine SchlĂŒsseltechnologie zur Optimierung der Verwaltung der limitierten Ressourcen auf Systemebene ist, indem die damit verbundenen Herausforderungen angegangen werden

    ADVANCED SLA MANAGEMENT IN CLOUD COMPUTING

    Get PDF
    The advent of high-performance technologies and the increase in volume of data used by organizations led to the need for migration from an internal structure to Cloud environment. The continuous development of tools, methods and techniques have expanded the understanding of the various functions, structures and processes related to Cloud Computing. However, the increase in computing power led to the development and use of more complex models, including this scope the complexity of Service Level Agreements (SLA). The need for understanding at a high level of SLAs established between customers and service providers in Cloud led to different studies on the definition and standardization of these agreements. Nowadays, cloud computing technologies are becoming more and more popular, especially with respect to data storage. However, the processes used to determine the Cloud Service Agreements do not consider the final customer\u2019s needs, considering only the supply capacity of the service provider. For these reasons, the development of service agreements that meets the needs of customers should be designed in order to increase the usability of Cloud environments, and enabling the discovery of new areas of application in accordance with market demand. In this context, the use of ontologies that describes the information that composes each type of service, and thus enable an understanding of the agreements reached, is configured as an approach to be considered. Moreover, the generalization and abstraction of information that can be observed in different services allows a broader vision for managing SLAs. For these reasons, this thesis aims to find innovative methods for the composition of Service Level Agreements in Cloud Computing. In particular, the methods presented allow demonstrate the convergence of several consolidated techniques in research on Cloud SLA using a new approach that considers new demands on Cloud and allows control of the established agreements, in addition to effectively ensure the application of the concept of XaaS (everything as a service). The originality of the approach allows the registration, search, composition and control of services in Cloud using the same structure. The new approach presented in this thesis allows the understanding of the impact of the new services requested by customers, giving the provider the possibility of simulating the use of the necessary resources to meet the new services\u2019 requests. From the presentation of a conceptual framework we can demonstrate the use of our approach through the examples of different situations presented in the real world and considering the new market possibilities
    • 

    corecore