230,410 research outputs found
ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)
We present ExplainIt!, a declarative, unsupervised root-cause analysis engine
that uses time series monitoring data from large complex systems such as data
centres. ExplainIt! empowers operators to succinctly specify a large number of
causal hypotheses to search for causes of interesting events. ExplainIt! then
ranks these hypotheses, reducing the number of causal dependencies from
hundreds of thousands to a handful for human understanding. We show how a
declarative language, such as SQL, can be effective in declaratively
enumerating hypotheses that probe the structure of an unknown probabilistic
graphical causal model of the underlying system. Our thesis is that databases
are in a unique position to enable users to rapidly explore the possible causal
mechanisms in data collected from diverse sources. We empirically demonstrate
how ExplainIt! had helped us resolve over 30 performance issues in a commercial
product since late 2014, of which we discuss a few cases in detail.Comment: SIGMOD Industry Track 201
Efficient Multi-way Theta-Join Processing Using MapReduce
Multi-way Theta-join queries are powerful in describing complex relations and
therefore widely employed in real practices. However, existing solutions from
traditional distributed and parallel databases for multi-way Theta-join queries
cannot be easily extended to fit a shared-nothing distributed computing
paradigm, which is proven to be able to support OLAP applications over immense
data volumes. In this work, we study the problem of efficient processing of
multi-way Theta-join queries using MapReduce from a cost-effective perspective.
Although there have been some works using the (key,value) pair-based
programming model to support join operations, efficient processing of multi-way
Theta-join queries has never been fully explored. The substantial challenge
lies in, given a number of processing units (that can run Map or Reduce tasks),
mapping a multi-way Theta-join query to a number of MapReduce jobs and having
them executed in a well scheduled sequence, such that the total processing time
span is minimized. Our solution mainly includes two parts: 1) cost metrics for
both single MapReduce job and a number of MapReduce jobs executed in a certain
order; 2) the efficient execution of a chain-typed Theta-join with only one
MapReduce job. Comparing with the query evaluation strategy proposed in [23]
and the widely adopted Pig Latin and Hive SQL solutions, our method achieves
significant improvement of the join processing efficiency.Comment: VLDB201
Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins
Sliding window join is one of the most important operators for stream
applications. To produce high quality join results, a stream processing system
must deal with the ubiquitous disorder within input streams which is caused by
network delay, asynchronous source clocks, etc. Disorder handling involves an
inevitable tradeoff between the latency and the quality of produced join
results. To meet different requirements of stream applications, it is desirable
to provide a user-configurable result-latency vs. result-quality tradeoff.
Existing disorder handling approaches either do not provide such
configurability, or support only user-specified latency constraints.
In this work, we advocate the idea of quality-driven disorder handling, and
propose a buffer-based disorder handling approach for sliding window joins,
which minimizes sizes of input-sorting buffers, thus the result latency, while
respecting user-specified result-quality requirements. The core of our approach
is an analytical model which directly captures the relationship between sizes
of input buffers and the produced result quality. Our approach is generic. It
supports m-way sliding window joins with arbitrary join conditions. Experiments
on real-world and synthetic datasets show that, compared to the state of the
art, our approach can reduce the result latency incurred by disorder handling
by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201
Coalitional Games with Overlapping Coalitions for Interference Management in Small Cell Networks
In this paper, we study the problem of cooperative interference management in
an OFDMA two-tier small cell network. In particular, we propose a novel
approach for allowing the small cells to cooperate, so as to optimize their
sum-rate, while cooperatively satisfying their maximum transmit power
constraints. Unlike existing work which assumes that only disjoint groups of
cooperative small cells can emerge, we formulate the small cells' cooperation
problem as a coalition formation game with overlapping coalitions. In this
game, each small cell base station can choose to participate in one or more
cooperative groups (or coalitions) simultaneously, so as to optimize the
tradeoff between the benefits and costs associated with cooperation. We study
the properties of the proposed overlapping coalition formation game and we show
that it exhibits negative externalities due to interference. Then, we propose a
novel decentralized algorithm that allows the small cell base stations to
interact and self-organize into a stable overlapping coalitional structure.
Simulation results show that the proposed algorithm results in a notable
performance advantage in terms of the total system sum-rate, relative to the
noncooperative case and the classical algorithms for coalitional games with
non-overlapping coalitions
The Elusive p-air Cross Section
For the \pbar p and systems, we have used all of the extensive data of
the Particle Data Group[K. Hagiwara {\em et al.} (Particle Data Group), Phys.
Rev. D 66, 010001 (2002).]. We then subject these data to a screening process,
the ``Sieve'' algorithm[M. M. Block, physics/0506010.], in order to eliminate
``outliers'' that can skew a fit. With the ``Sieve'' algorithm, a
robust fit using a Lorentzian distribution is first made to all of the data to
sieve out abnormally high \delchi, the individual i point's
contribution to the total . The fits are then made to the
sieved data. We demonstrate that we cleanly discriminate between asymptotic
and behavior of total hadronic cross sections when we require
that these amplitudes {\em also} describe, on average, low energy data
dominated by resonances. We simultaneously fit real analytic amplitudes to the
``sieved'' high energy measurements of and total cross sections
and -values for GeV, while requiring that their asymptotic
fits smoothly join the the and total cross
sections at 4.0 GeV--again {\em both} in magnitude and slope. Our
results strongly favor a high energy fit, basically excluding a fit. Finally, we make a screened Glauber fit for the p-air cross section,
using as input our precisely-determined cross sections at cosmic ray
energies.Comment: 15 pages, 6 figures, 2 table,Paper delivered at c2cr2005 Conference,
Prague, September 7-13, 2005. Fig. 2 was missing from V1. V3 fixes all
figure
- …