210 research outputs found
How to Price Shared Optimizations in the Cloud
Data-management-as-a-service systems are increasingly being used in
collaborative settings, where multiple users access common datasets. Cloud
providers have the choice to implement various optimizations, such as indexing
or materialized views, to accelerate queries over these datasets. Each
optimization carries a cost and may benefit multiple users. This creates a
major challenge: how to select which optimizations to perform and how to share
their cost among users. The problem is especially challenging when users are
selfish and will only report their true values for different optimizations if
doing so maximizes their utility. In this paper, we present a new approach for
selecting and pricing shared optimizations by using Mechanism Design. We first
show how to apply the Shapley Value Mechanism to the simple case of selecting
and pricing additive optimizations, assuming an offline game where all users
access the service for the same time-period. Second, we extend the approach to
online scenarios where users come and go. Finally, we consider the case of
substitutive optimizations. We show analytically that our mechanisms induce
truth- fulness and recover the optimization costs. We also show experimentally
that our mechanisms yield higher utility than the state-of-the-art approach
based on regret accumulation.Comment: VLDB201
PerfXplain: Debugging MapReduce Job Performance
While users today have access to many tools that assist in performing large
scale data analysis tasks, understanding the performance characteristics of
their parallel computations, such as MapReduce jobs, remains difficult. We
present PerfXplain, a system that enables users to ask questions about the
relative performances (i.e., runtimes) of pairs of MapReduce jobs. PerfXplain
provides a new query language for articulating performance queries and an
algorithm for generating explanations from a log of past MapReduce job
executions. We formally define the notion of an explanation together with three
metrics, relevance, precision, and generality, that measure explanation
quality. We present the explanation-generation algorithm based on techniques
related to decision-tree building. We evaluate the approach on a log of past
executions on Amazon EC2, and show that our approach can generate quality
explanations, outperforming two naive explanation-generation methods.Comment: VLDB201
Efficient Iterative Processing in the SciDB Parallel Array Engine
Many scientific data-intensive applications perform iterative computations on
array data. There exist multiple engines specialized for array processing.
These engines efficiently support various types of operations, but none
includes native support for iterative processing. In this paper, we develop a
model for iterative array computations and a series of optimizations. We
evaluate the benefits of an optimized, native support for iterative array
processing on the SciDB engine and real workloads from the astronomy domain
Believe It or Not: Adding Belief Annotations to Databases
We propose a database model that allows users to annotate data with belief
statements. Our motivation comes from scientific database applications where a
community of users is working together to assemble, revise, and curate a shared
data repository. As the community accumulates knowledge and the database
content evolves over time, it may contain conflicting information and members
can disagree on the information it should store. For example, Alice may believe
that a tuple should be in the database, whereas Bob disagrees. He may also
insert the reason why he thinks Alice believes the tuple should be in the
database, and explain what he thinks the correct tuple should be instead.
We propose a formal model for Belief Databases that interprets users'
annotations as belief statements. These annotations can refer both to the base
data and to other annotations. We give a formal semantics based on a fragment
of multi-agent epistemic logic and define a query language over belief
databases. We then prove a key technical result, stating that every belief
database can be encoded as a canonical Kripke structure. We use this structure
to describe a relational representation of belief databases, and give an
algorithm for translating queries over the belief database into standard
relational queries. Finally, we report early experimental results with our
prototype implementation on synthetic data.Comment: 17 pages, 10 figure
Reconception de systèmes orientés-objet basée sur l'analyse des clones
Existence de clones dans les systèmes informatiques -- Contribution de la recherche -- Contributions originales de la recherche -- Taxinomie des clones -- Comparaison de clones -- Processus d'extraction des différences -- Abstraction du code source -- Mise en correspondance -- Union des ensembles de différences -- Projection des différences sur l'AST -- Reconception -- Factorisation des parties communes et paramétrisation des différences -- Découplage du contexte -- Choix de solutions complètes -- Solutions complètes avec patrons de conception -- Formalisation des reconceptions -- Classification des clones de plusieurs systèmes réels -- Reconception automatique -- Avantages de la reconception -- Limites d'une approche automatique -- Approche interactive
- …