39,329 research outputs found

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

    Topology Control for Maintaining Network Connectivity and Maximizing Network Capacity Under the Physical Model

    Get PDF
    In this paper we study the issue of topology control under the physical Signal-to-Interference-Noise-Ratio (SINR) model, with the objective of maximizing network capacity. We show that existing graph-model-based topology control captures interference inadequately under the physical SINR model, and as a result, the interference in the topology thus induced is high and the network capacity attained is low. Towards bridging this gap, we propose a centralized approach, called Spatial Reuse Maximizer (MaxSR), that combines a power control algorithm T4P with a topology control algorithm P4T. T4P optimizes the assignment of transmit power given a fixed topology, where by optimality we mean that the transmit power is so assigned that it minimizes the average interference degree (defined as the number of interferencing nodes that may interfere with the on-going transmission on a link) in the topology. P4T, on the other hand, constructs, based on the power assignment made in T4P, a new topology by deriving a spanning tree that gives the minimal interference degree. By alternately invoking the two algorithms, the power assignment quickly converges to an operational point that maximizes the network capacity. We formally prove the convergence of MaxSR. We also show via simulation that the topology induced by MaxSR outperforms that derived from existing topology control algorithms by 50%-110% in terms of maximizing the network capacity

    Shortest Path and Distance Queries on Road Networks: An Experimental Evaluation

    Full text link
    Computing the shortest path between two given locations in a road network is an important problem that finds applications in various map services and commercial navigation products. The state-of-the-art solutions for the problem can be divided into two categories: spatial-coherence-based methods and vertex-importance-based approaches. The two categories of techniques, however, have not been compared systematically under the same experimental framework, as they were developed from two independent lines of research that do not refer to each other. This renders it difficult for a practitioner to decide which technique should be adopted for a specific application. Furthermore, the experimental evaluation of the existing techniques, as presented in previous work, falls short in several aspects. Some methods were tested only on small road networks with up to one hundred thousand vertices; some approaches were evaluated using distance queries (instead of shortest path queries), namely, queries that ask only for the length of the shortest path; a state-of-the-art technique was examined based on a faulty implementation that led to incorrect query results. To address the above issues, this paper presents a comprehensive comparison of the most advanced spatial-coherence-based and vertex-importance-based approaches. Using a variety of real road networks with up to twenty million vertices, we evaluated each technique in terms of its preprocessing time, space consumption, and query efficiency (for both shortest path and distance queries). Our experimental results reveal the characteristics of different techniques, based on which we provide guidelines on selecting appropriate methods for various scenarios.Comment: VLDB201

    Incremental Maintenance of Maximal Cliques in a Dynamic Graph

    Full text link
    We consider the maintenance of the set of all maximal cliques in a dynamic graph that is changing through the addition or deletion of edges. We present nearly tight bounds on the magnitude of change in the set of maximal cliques, as well as the first change-sensitive algorithms for clique maintenance, whose runtime is proportional to the magnitude of the change in the set of maximal cliques. We present experimental results showing these algorithms are efficient in practice and are faster than prior work by two to three orders of magnitude.Comment: 18 pages, 8 figure
    corecore