58 research outputs found

    Optimizing the Replication of Multi-Quality Web Applications Using ACO and WoLF

    Get PDF
    This thesis presents the adaptation of Ant Colony Optimization to a new NP-hard problem involving the replication of multi-quality database-driven web applications (DAs) by a large application service provider (ASP). The ASP must assign DA replicas to its network of heterogeneous servers so that user demand is satisfied and replica update loads are minimized. The algorithm proposed, AntDA, for solving this problem is novel in several respects: ants traverse a bipartite graph in both directions as they construct solutions, pheromone is used for traversing from one side of the bipartite graph to the other and back again, heuristic edge values change as ants construct solutions, and ants may sometimes produce infeasible solutions. Experiments show that AntDA outperforms several other solution methods, but there was room for improvement in the convergence rates of the ants. Therefore, in an attempt to achieve the goals of faster convergence and better solution values for larger problems, AntDA was combined with the variable-step policy hill-climbing algorithm called Win or Learn Fast (WoLF). In experimentation, the addition of this learning algorithm in AntDA provided for faster convergence while outperforming other solution methods

    Query optimizers based on machine learning techniques

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaQuery optimizers are considered one of the most relevant and sophisticated components in a database management system. However, despite currently producing nearly optimal results, optimizers rely on statistical estimates and heuristics to reduce the search space of alternative execution plans for a single query. As a result, for more complex queries, errors may grow exponentially, often translating into sub-optimal plans resulting in less than ideal performance. Recent advances in machine learning techniques have opened new opportunities for many of the existing problems related to system optimization. This document proposes a solution built on top of PostgreSQL that learns to select the most efficient set of optimizer strategy settings for a particular query. Instead of depending entirely on the optimizer’s estimates to compare different plans under different configurations, it relies on a greedy selection algorithm that supports several types of predictive modeling techniques, from more traditional modeling techniques to a deep learning approach. The system is evaluated experimentally with the standard TPC-H and Join Order ing Benchmark workloads to measure the cost and benefits of adding machine learning capabilities to traditional query optimizers.Os otimizadores de queries são considerados um dos componentes de maior relevância e complexidade num sistema de gestão de bases de dados. No entanto, apesar de atualmente produzirem resultados quase ótimos, os otimizadores dependem do uso de estimativas estatísticas e de heurísticas para reduzir o espaço de procura de planos de execução alternativos para uma determinada query. Como resultado, para queries mais complexas, os erros podem crescer exponencialmente, o que geralmente se traduz em planos sub-ótimos, resultando num desempenho inferior ao ideal. Os recentes avanços nas técnicas de aprendizagem automática abriram novas oportunidades para muitos dos problemas existentes relacionados com otimização de sistemas. Este documento propõe uma solução construída sobre o PostgreSQL que aprende a selecionar o conjunto mais eficiente de configurações do otimizador para uma determinada query. Em vez de depender inteiramente de estimativas do otimizador para comparar planos de configurações diferentes, a solução baseia-se num algoritmo de seleção greedy que suporta vários tipos de técnicas de modelagem preditiva, desde técnicas mais tradicionais a uma abordagem de deep learning. O sistema é avaliado experimentalmente com os workloads TPC-H e Join Ordering Benchmark para medir o custo e os benefícios de adicionar aprendizagem automática a otimizadores de queries tradicionais.This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020

    Online failure prediction in air traffic control systems

    Get PDF
    This thesis introduces a novel approach to online failure prediction for mission critical distributed systems that has the distinctive features to be black-box, non-intrusive and online. The approach combines Complex Event Processing (CEP) and Hidden Markov Models (HMM) so as to analyze symptoms of failures that might occur in the form of anomalous conditions of performance metrics identified for such purpose. The thesis presents an architecture named CASPER, based on CEP and HMM, that relies on sniffed information from the communication network of a mission critical system, only, for predicting anomalies that can lead to software failures. An instance of Casper has been implemented, trained and tuned to monitor a real Air Traffic Control (ATC) system developed by Selex ES, a Finmeccanica Company. An extensive experimental evaluation of CASPER is presented. The obtained results show (i) a very low percentage of false positives over both normal and under stress conditions, and (ii) a sufficiently high failure prediction time that allows the system to apply appropriate recovery procedures

    Online failure prediction in air traffic control systems

    Get PDF
    This thesis introduces a novel approach to online failure prediction for mission critical distributed systems that has the distinctive features to be black-box, non-intrusive and online. The approach combines Complex Event Processing (CEP) and Hidden Markov Models (HMM) so as to analyze symptoms of failures that might occur in the form of anomalous conditions of performance metrics identified for such purpose. The thesis presents an architecture named CASPER, based on CEP and HMM, that relies on sniffed information from the communication network of a mission critical system, only, for predicting anomalies that can lead to software failures. An instance of Casper has been implemented, trained and tuned to monitor a real Air Traffic Control (ATC) system developed by Selex ES, a Finmeccanica Company. An extensive experimental evaluation of CASPER is presented. The obtained results show (i) a very low percentage of false positives over both normal and under stress conditions, and (ii) a sufficiently high failure prediction time that allows the system to apply appropriate recovery procedures

    Flexible and intelligent network programming for cloud networks

    Get PDF
    As modern online services are evolving promptly and involving larger amount of data and computation than ever, the demand for cloud networks keeps growing rapidly, which also brings new challenges to network programming. Network programming is a complicated and crucial task for building high-performance cloud networks. Current network programming mainly presents two shortcomings: (1) it is inflexible as adding new data-plane features usually takes several years; (2) it is unintelligent as it heavily depends on human-designed heuristic algorithms to solve production-scale problems. To overcome these shortcomings, this dissertation realizes flexible and intelligent network programming by leveraging the recent development of new technologies both in hardware and software. Specifically, it presents four systems with new performance features that cannot be achieved by conventional network programming: (i) Harmonia: A new replicated storage architecture that provides near-linear scalability without sacrificing consistency. By exploiting the programming flexibility of new-generation programmable switches, Harmonia checks read-write conflicts in network for guaranteeing consistency, and enables any replica to serve reads for objects with no pending writes for near-linear scalability. (ii) RackSched: A microsecond-scale scheduler for rack-scale computers. It proposes a two-layer scheduling framework that integrates the inter-server scheduler in the top-of-rack (ToR) switch with intra-server schedulers on each server. The in-network inter-server scheduler is programmed to realize power-of-k-choices, ensure request affinity, and track server loads accurately and efficiently. (iii) NetVRM: A network management system that supports dynamic register memory sharing in the network. It orchestrates the register memory allocation between multiple concurrent network applications to optimize the multiplexing benefits. This goal is achieved with three major features: a virtual register memory abstraction, a dynamic memory allocation algorithm, and a domain-specific programming language extension. (iv) NeuroPlan: Automated and efficient network planning with deep reinforcement learning (RL). It leverages a two-stage hybrid approach that first uses deep RL to prune a large and complex search space and then uses an Integer Linear Programming (ILP) solver to find the final solution. Such an automated approach avoids human efforts to design heuristic algorithms manually and reduces network plan cost efficiently. We have done theoretical analysis, built testbeds, and evaluated these systems with prototype experiments and simulations under realistic setups from production networks

    Adaptive Asynchronous Control and Consistency in Distributed Data Exploration Systems

    Get PDF
    Advances in machine learning and streaming systems provide a backbone to transform vast arrays of raw data into valuable information. Leveraging distributed execution, analysis engines can process this information effectively within an iterative data exploration workflow to solve problems at unprecedented rates. However, with increased input dimensionality, a desire to simultaneously share and isolate information, as well as overlapping and dependent tasks, this process is becoming increasingly difficult to maintain. User interaction derails exploratory progress due to manual oversight on lower level tasks such as tuning parameters, adjusting filters, and monitoring queries. We identify human-in-the-loop management of data generation and distributed analysis as an inhibiting problem precluding efficient online, iterative data exploration which causes delays in knowledge discovery and decision making. The flexible and scalable systems implementing the exploration workflow require semi-autonomous methods integrated as architectural support to reduce human involvement. We, thus, argue that an abstraction layer providing adaptive asynchronous control and consistency management over a series of individual tasks coordinated to achieve a global objective can significantly improve data exploration effectiveness and efficiency. This thesis introduces methodologies which autonomously coordinate distributed execution at a lower level in order to synchronize multiple efforts as part of a common goal. We demonstrate the impact on data exploration through serverless simulation ensemble management and multi-model machine learning by showing improved performance and reduced resource utilization enabling a more productive semi-autonomous exploration workflow. We focus on the specific genres of molecular dynamics and personalized healthcare, however, the contributions are applicable to a wide variety of domains

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems
    • …
    corecore