920 research outputs found

    A Machine Learning-based Framework for Building Application Failure Prediction Models

    Get PDF
    In this paper, we present the Framework for building Failure Prediction Models (F2PM), a Machine Learning-based Framework to build models for predicting the Remaining Time to Failure (RTTF) of applications in the presence of software anomalies. F2PM uses measurements of a number of system features in order to create a knowledge base, which is then used to build prediction models. F2PM is application-independent, i.e. It solely exploits measurements of system-level features. Thus, it can be used in differentiated contexts, without the need for any manual modification or intervention to the running applications. To generate optimized models, F2PM can perform a feature selection to identify, among all the measured system features, which have a major impact in the prediction of the RTTF. This allows to produce different models, which use different set of input features. Generated models can be compared by the user by using a set of metrics produced by F2PM, which are related to the model prediction accuracy, as well as to the model building time. We also present experimental results of a successful application of F2PM, using the standard TPC-W e-commerce benchmark

    Database Learning: Toward a Database that Becomes Smarter Every Time

    Full text link
    In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM SIGMOD conference 201

    Performance Modeling and Resource Management for Mapreduce Applications

    Get PDF
    Big Data analytics is increasingly performed using the MapReduce paradigm and its open-source implementation Hadoop as a platform choice. Many applications associated with live business intelligence are written as complex data analysis programs defined by directed acyclic graphs of MapReduce jobs. An increasing number of these applications have additional requirements for completion time guarantees. The advent of cloud computing brings a competitive alternative solution for data analytic problems while it also introduces new challenges in provisioning clusters that provide best cost-performance trade-offs. In this dissertation, we aim to develop a performance evaluation framework that enables automatic resource management for MapReduce applications in achieving different optimization goals. It consists of the following components: (1) a performance modeling framework that estimates the completion time of a given MapReduce application when executed on a Hadoop cluster according to its input data sets, the job settings and the amount of allocated resources for processing it; (2) a resource allocation strategy for deadline-driven MapReduce applications that automatically tailors and controls the resource allocation on a shared Hadoop cluster to different applications to achieve their (soft) deadlines; (3) a simulator-based solution to the resource provision problem in public cloud environment that guides the users to determine the types and amount of resources that should lease from the service provider for achieving different goals; (4) an optimization strategy to automatically determine the optimal job settings within a MapReduce application for efficient execution and resource usage. We validate the accuracy, efficiency, and performance benefits of the proposed framework using a set of realistic MapReduce applications on both private cluster and public cloud environment

    Sensitivity and discovery potential of the proposed nEXO experiment to neutrinoless double beta decay

    Full text link
    The next-generation Enriched Xenon Observatory (nEXO) is a proposed experiment to search for neutrinoless double beta (0νββ0\nu\beta\beta) decay in 136^{136}Xe with a target half-life sensitivity of approximately 102810^{28} years using 5×1035\times10^3 kg of isotopically enriched liquid-xenon in a time projection chamber. This improvement of two orders of magnitude in sensitivity over current limits is obtained by a significant increase of the 136^{136}Xe mass, the monolithic and homogeneous configuration of the active medium, and the multi-parameter measurements of the interactions enabled by the time projection chamber. The detector concept and anticipated performance are presented based upon demonstrated realizable background rates.Comment: v2 as publishe

    Improving Key-Value Database Scalability with Lazy State Determination

    Get PDF
    Applications keep demanding higher and higher throughput and lower response times from Database systems. Databases leverage concurrency, by using both multiple computer systems (nodes) and the multiple cores available in each node, to execute multiple requests (transactions) concurrently. Executing multiple transactions concurrently requires coordination, which is ensured by the database concurrency control (CC) module. However, excessive control/limitation of concurrency by the CC module negatively impacts the overall performance (latency and throughput) of the database system. The performance limitations imposed by the database CC module can be addressed by exploring new hardware, or by leveraging software-based techniques such as futures and lazy evaluation of transactions. This is where Lazy State Determination (LSD) shines [43, 42]. LSD proposes a new transactional API that decreases the conflicts between concurrent transactions by enabling the use of futures in both SQL and Key-Value database systems. The use of futures allows LSD to better capture the application semantics and to make more informed decisions on what really constitutes a conflict. These two key insights get together to create a system that provides high throughput in high contention scenarios. Our work builds on top of a previous LSD prototype. We identified and diagnosed its shortcomings, and devised and implemented a new prototype that addressed them. We validated our new LSD system and evaluated its behaviour and performance by comparing and contrasting with the original prototype. Our evaluation showed that the throughput of the new LSD prototype is from 3.7× to 4.9× higher, in centralized and distributed settings respectively, while also reducing the latency up to 10 times. With this work, we provide an LSD-based Key-Value Database System that has better vertical and horizontal scalability, and can take advantage of systems with higher core count or high number of nodes, in centralized and distributed settings, respectively.As aplicações continuam a exigir aos sistemas de base de dados (BD) débitos cada vez maiores e tempos de resposta cada vez menores. As BD respondem explorando a concorrência, usando múltiplos sistemas computacionais (nós) e os vários cores disponíveis em cada um desses nós, para executar vários pedidos (transações) simultaneamente. A execução de múltiplas transações simultaneamente requer coordenação, assegurada pelo módulo de controlo de concorrência (CC) da BD. No entanto, o controlo/limitação excessiva de concorrência pelo módulo de CC impacta negativamente o desempenho geral (latência e débito) do sistema de BD. As limitações de desempenho impostas pelo módulo CC da BD podem ser abordadas tanto explorando novo hardware como recorrendo a técnicas baseadas em software, como futuros e avaliação diferida de transações. É aqui que o Lazy State Determination (LSD) brilha [43, 42]. O LSD propõe uma nova API transacional que permite o uso de futuros em sistemas de BD SQL e Chave-Valor, diminuindo os conflitos entre transações concorrentes. O uso de futuros permite também que o LSD capture melhor a semântica da aplicação e tome decisões mais informadas sobre o que realmente constitui um conflito. Estes dois aspetos combinam-se para criar um sistema transacional que fornece elevado débito em cenários de alta contenção. O nosso trabalho foi desenvolvido sobe um protótipo anterior de LSD. Identificamos e diagnosticamos as suas deficiências e limitações, e concebemos e implementamos um novo protótipo que as endereçou. Validamos o novo sistema LSD e avaliamos o seu comportamento e desempenho comparando e contrastando com o protótipo original. A nossa avaliação mostrou que o débito do novo protótipo LSD é de 3,7× a 4,9× maior, em configurações centralizadas e distribuídas, respetivamente, além de reduzir a latência até 10 vezes. Com este trabalho, disponibilizamos um sistema de base de dados de Chave-Valor baseado em LSD que possui melhor escalabilidade vertical e horizontal, fazendo melhor uso de sistemas com múltiplos cores ou com elevado número de nós
    corecore