230,410 research outputs found

    ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)

    Full text link
    We present ExplainIt!, a declarative, unsupervised root-cause analysis engine that uses time series monitoring data from large complex systems such as data centres. ExplainIt! empowers operators to succinctly specify a large number of causal hypotheses to search for causes of interesting events. ExplainIt! then ranks these hypotheses, reducing the number of causal dependencies from hundreds of thousands to a handful for human understanding. We show how a declarative language, such as SQL, can be effective in declaratively enumerating hypotheses that probe the structure of an unknown probabilistic graphical causal model of the underlying system. Our thesis is that databases are in a unique position to enable users to rapidly explore the possible causal mechanisms in data collected from diverse sources. We empirically demonstrate how ExplainIt! had helped us resolve over 30 performance issues in a commercial product since late 2014, of which we discuss a few cases in detail.Comment: SIGMOD Industry Track 201

    Efficient Multi-way Theta-Join Processing Using MapReduce

    Full text link
    Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volumes. In this work, we study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective. Although there have been some works using the (key,value) pair-based programming model to support join operations, efficient processing of multi-way Theta-join queries has never been fully explored. The substantial challenge lies in, given a number of processing units (that can run Map or Reduce tasks), mapping a multi-way Theta-join query to a number of MapReduce jobs and having them executed in a well scheduled sequence, such that the total processing time span is minimized. Our solution mainly includes two parts: 1) cost metrics for both single MapReduce job and a number of MapReduce jobs executed in a certain order; 2) the efficient execution of a chain-typed Theta-join with only one MapReduce job. Comparing with the query evaluation strategy proposed in [23] and the widely adopted Pig Latin and Hive SQL solutions, our method achieves significant improvement of the join processing efficiency.Comment: VLDB201

    Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins

    Full text link
    Sliding window join is one of the most important operators for stream applications. To produce high quality join results, a stream processing system must deal with the ubiquitous disorder within input streams which is caused by network delay, asynchronous source clocks, etc. Disorder handling involves an inevitable tradeoff between the latency and the quality of produced join results. To meet different requirements of stream applications, it is desirable to provide a user-configurable result-latency vs. result-quality tradeoff. Existing disorder handling approaches either do not provide such configurability, or support only user-specified latency constraints. In this work, we advocate the idea of quality-driven disorder handling, and propose a buffer-based disorder handling approach for sliding window joins, which minimizes sizes of input-sorting buffers, thus the result latency, while respecting user-specified result-quality requirements. The core of our approach is an analytical model which directly captures the relationship between sizes of input buffers and the produced result quality. Our approach is generic. It supports m-way sliding window joins with arbitrary join conditions. Experiments on real-world and synthetic datasets show that, compared to the state of the art, our approach can reduce the result latency incurred by disorder handling by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201

    Coalitional Games with Overlapping Coalitions for Interference Management in Small Cell Networks

    Full text link
    In this paper, we study the problem of cooperative interference management in an OFDMA two-tier small cell network. In particular, we propose a novel approach for allowing the small cells to cooperate, so as to optimize their sum-rate, while cooperatively satisfying their maximum transmit power constraints. Unlike existing work which assumes that only disjoint groups of cooperative small cells can emerge, we formulate the small cells' cooperation problem as a coalition formation game with overlapping coalitions. In this game, each small cell base station can choose to participate in one or more cooperative groups (or coalitions) simultaneously, so as to optimize the tradeoff between the benefits and costs associated with cooperation. We study the properties of the proposed overlapping coalition formation game and we show that it exhibits negative externalities due to interference. Then, we propose a novel decentralized algorithm that allows the small cell base stations to interact and self-organize into a stable overlapping coalitional structure. Simulation results show that the proposed algorithm results in a notable performance advantage in terms of the total system sum-rate, relative to the noncooperative case and the classical algorithms for coalitional games with non-overlapping coalitions

    The Elusive p-air Cross Section

    Full text link
    For the \pbar p and pppp systems, we have used all of the extensive data of the Particle Data Group[K. Hagiwara {\em et al.} (Particle Data Group), Phys. Rev. D 66, 010001 (2002).]. We then subject these data to a screening process, the ``Sieve'' algorithm[M. M. Block, physics/0506010.], in order to eliminate ``outliers'' that can skew a χ2\chi^2 fit. With the ``Sieve'' algorithm, a robust fit using a Lorentzian distribution is first made to all of the data to sieve out abnormally high \delchi, the individual ith^{\rm th} point's contribution to the total χ2\chi^2. The χ2\chi^2 fits are then made to the sieved data. We demonstrate that we cleanly discriminate between asymptotic lns\ln s and ln2s\ln^2s behavior of total hadronic cross sections when we require that these amplitudes {\em also} describe, on average, low energy data dominated by resonances. We simultaneously fit real analytic amplitudes to the ``sieved'' high energy measurements of pˉp\bar p p and pppp total cross sections and ρ\rho-values for s6\sqrt s\ge 6 GeV, while requiring that their asymptotic fits smoothly join the the σpˉp\sigma_{\bar p p} and σpp\sigma_{pp} total cross sections at s=\sqrt s=4.0 GeV--again {\em both} in magnitude and slope. Our results strongly favor a high energy ln2s\ln^2s fit, basically excluding a lns\ln s fit. Finally, we make a screened Glauber fit for the p-air cross section, using as input our precisely-determined pppp cross sections at cosmic ray energies.Comment: 15 pages, 6 figures, 2 table,Paper delivered at c2cr2005 Conference, Prague, September 7-13, 2005. Fig. 2 was missing from V1. V3 fixes all figure
    corecore