920 research outputs found
A Machine Learning-based Framework for Building Application Failure Prediction Models
In this paper, we present the Framework for building Failure Prediction Models (F2PM), a Machine Learning-based Framework to build models for predicting the Remaining Time to Failure (RTTF) of applications in the presence of software anomalies. F2PM uses measurements of a number of system features in order to create a knowledge base, which is then used to build prediction models. F2PM is application-independent, i.e. It solely exploits measurements of system-level features. Thus, it can be used in differentiated contexts, without the need for any manual modification or intervention to the running applications. To generate optimized models, F2PM can perform a feature selection to identify, among all the measured system features, which have a major impact in the prediction of the RTTF. This allows to produce different models, which use different set of input features. Generated models can be compared by the user by using a set of metrics produced by F2PM, which are related to the model prediction accuracy, as well as to the model building time. We also present experimental results of a successful application of F2PM, using the standard TPC-W e-commerce benchmark
Recommended from our members
Stochastic modeling for performance evaluation of database replication protocols
Performance is often the most important non-functional property for database systems and associated replication solutions. This is true at least in in-dustrial contexts. Evaluating performance using real systems, however, is com-putationally demanding and costly. In many cases, choosing between several competing replication protocols poses a difficulty in ranking these protocols meaningfully: the ranking is determined not so much by the quality of the com-peting protocols but, instead, by the quality of the available implementations. Addressing this difficulty requires a level of abstraction in which the impact on the comparison of the implementations is reduced, or entirely eliminated. We propose a stochastic model for performance evaluation of database replication protocols, paying particular attention to: i) empirical validation of a number of assumptions used in the stochastic model, and ii) empirical validation of model accuracy for a chosen replication protocol. For the empirical validations we used the TPC-C benchmark. Our implementation of the model is based on Stochastic Activity Networks (SAN), extended by bespoke code. The model may reduce the cost of performance evaluation in comparison with empirical measurements, while keeping the accuracy of the assessment to an acceptable level
Database Learning: Toward a Database that Becomes Smarter Every Time
In today's databases, previous query answers rarely benefit answering future
queries. For the first time, to the best of our knowledge, we change this
paradigm in an approximate query processing (AQP) context. We make the
following observation: the answer to each query reveals some degree of
knowledge about the answer to another query because their answers stem from the
same underlying distribution that has produced the entire dataset. Exploiting
and refining this knowledge should allow us to answer queries more
analytically, rather than by reading enormous amounts of raw data. Also,
processing more queries should continuously enhance our knowledge of the
underlying distribution, and hence lead to increasingly faster response times
for future queries.
We call this novel idea---learning from past query answers---Database
Learning. We exploit the principle of maximum entropy to produce answers, which
are in expectation guaranteed to be more accurate than existing sample-based
approximations. Empowered by this idea, we build a query engine on top of Spark
SQL, called Verdict. We conduct extensive experiments on real-world query
traces from a large customer of a major database vendor. Our results
demonstrate that Verdict supports 73.7% of these queries, speeding them up by
up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM
SIGMOD conference 201
Performance Modeling and Resource Management for Mapreduce Applications
Big Data analytics is increasingly performed using the MapReduce paradigm and its open-source implementation Hadoop as a platform choice. Many applications associated with live business intelligence are written as complex data analysis programs defined by directed acyclic graphs of MapReduce jobs. An increasing number of these applications have additional requirements for completion time guarantees. The advent of cloud computing brings a competitive alternative solution for data analytic problems while it also introduces new challenges in provisioning clusters that provide best cost-performance trade-offs.
In this dissertation, we aim to develop a performance evaluation framework that enables automatic resource management for MapReduce applications in achieving different optimization goals. It consists of the following components: (1) a performance modeling framework that estimates the completion time of a given MapReduce application when executed on a Hadoop cluster according to its input data sets, the job settings and the amount of allocated resources for processing it; (2) a resource allocation strategy for deadline-driven MapReduce applications that automatically tailors and controls the resource allocation on a shared Hadoop cluster to different applications to achieve their (soft) deadlines; (3) a simulator-based solution to the resource provision problem in public cloud environment that guides the users to determine the types and amount of resources that should lease from the service provider for achieving different goals; (4) an optimization strategy to automatically determine the optimal job settings within a MapReduce application for efficient execution and resource usage. We validate the accuracy, efficiency, and performance benefits of the proposed framework using a set of realistic MapReduce applications on both private cluster and public cloud environment
Sensitivity and discovery potential of the proposed nEXO experiment to neutrinoless double beta decay
The next-generation Enriched Xenon Observatory (nEXO) is a proposed
experiment to search for neutrinoless double beta () decay in
Xe with a target half-life sensitivity of approximately years
using kg of isotopically enriched liquid-xenon in a time
projection chamber. This improvement of two orders of magnitude in sensitivity
over current limits is obtained by a significant increase of the Xe
mass, the monolithic and homogeneous configuration of the active medium, and
the multi-parameter measurements of the interactions enabled by the time
projection chamber. The detector concept and anticipated performance are
presented based upon demonstrated realizable background rates.Comment: v2 as publishe
Improving Key-Value Database Scalability with Lazy State Determination
Applications keep demanding higher and higher throughput and lower response times
from Database systems. Databases leverage concurrency, by using both multiple computer
systems (nodes) and the multiple cores available in each node, to execute multiple requests
(transactions) concurrently.
Executing multiple transactions concurrently requires coordination, which is ensured
by the database concurrency control (CC) module. However, excessive control/limitation
of concurrency by the CC module negatively impacts the overall performance (latency
and throughput) of the database system. The performance limitations imposed by the
database CC module can be addressed by exploring new hardware, or by leveraging
software-based techniques such as futures and lazy evaluation of transactions.
This is where Lazy State Determination (LSD) shines [43, 42]. LSD proposes a new
transactional API that decreases the conflicts between concurrent transactions by enabling
the use of futures in both SQL and Key-Value database systems. The use of futures allows
LSD to better capture the application semantics and to make more informed decisions on
what really constitutes a conflict. These two key insights get together to create a system
that provides high throughput in high contention scenarios.
Our work builds on top of a previous LSD prototype. We identified and diagnosed its
shortcomings, and devised and implemented a new prototype that addressed them. We
validated our new LSD system and evaluated its behaviour and performance by comparing
and contrasting with the original prototype. Our evaluation showed that the throughput
of the new LSD prototype is from 3.7× to 4.9× higher, in centralized and distributed
settings respectively, while also reducing the latency up to 10 times.
With this work, we provide an LSD-based Key-Value Database System that has better
vertical and horizontal scalability, and can take advantage of systems with higher core
count or high number of nodes, in centralized and distributed settings, respectively.As aplicações continuam a exigir aos sistemas de base de dados (BD) débitos cada
vez maiores e tempos de resposta cada vez menores. As BD respondem explorando a
concorrência, usando múltiplos sistemas computacionais (nós) e os vários cores disponíveis
em cada um desses nós, para executar vários pedidos (transações) simultaneamente.
A execução de múltiplas transações simultaneamente requer coordenação, assegurada
pelo módulo de controlo de concorrência (CC) da BD. No entanto, o controlo/limitação
excessiva de concorrência pelo módulo de CC impacta negativamente o desempenho
geral (latência e débito) do sistema de BD. As limitações de desempenho impostas pelo
módulo CC da BD podem ser abordadas tanto explorando novo hardware como recorrendo
a técnicas baseadas em software, como futuros e avaliação diferida de transações.
É aqui que o Lazy State Determination (LSD) brilha [43, 42]. O LSD propõe uma nova
API transacional que permite o uso de futuros em sistemas de BD SQL e Chave-Valor,
diminuindo os conflitos entre transações concorrentes. O uso de futuros permite também
que o LSD capture melhor a semântica da aplicação e tome decisões mais informadas
sobre o que realmente constitui um conflito. Estes dois aspetos combinam-se para criar
um sistema transacional que fornece elevado débito em cenários de alta contenção.
O nosso trabalho foi desenvolvido sobe um protótipo anterior de LSD. Identificamos
e diagnosticamos as suas deficiências e limitações, e concebemos e implementamos um
novo protótipo que as endereçou. Validamos o novo sistema LSD e avaliamos o seu
comportamento e desempenho comparando e contrastando com o protótipo original. A
nossa avaliação mostrou que o débito do novo protótipo LSD é de 3,7× a 4,9× maior,
em configurações centralizadas e distribuídas, respetivamente, além de reduzir a latência
até 10 vezes.
Com este trabalho, disponibilizamos um sistema de base de dados de Chave-Valor
baseado em LSD que possui melhor escalabilidade vertical e horizontal, fazendo melhor
uso de sistemas com múltiplos cores ou com elevado número de nós
- …