39,329 research outputs found
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
Topology Control for Maintaining Network Connectivity and Maximizing Network Capacity Under the Physical Model
In this paper we study the issue of topology control under the physical Signal-to-Interference-Noise-Ratio (SINR) model, with the objective of maximizing network capacity. We show that existing graph-model-based topology control captures interference inadequately under the physical SINR model, and as a result, the interference in the topology thus induced is high and the network capacity attained is low. Towards bridging this gap, we propose a centralized approach, called Spatial Reuse Maximizer (MaxSR), that combines a power control algorithm T4P with a topology control algorithm P4T. T4P optimizes the assignment of transmit power given a fixed topology, where by optimality we mean that the transmit power is so assigned that it minimizes the average interference degree (defined as the number of interferencing nodes that may interfere with the on-going transmission on a link) in the topology. P4T, on the other hand, constructs, based on the power assignment made in T4P, a new topology by deriving a spanning tree that gives the minimal interference degree. By alternately invoking the two algorithms, the power assignment quickly converges to an operational point that maximizes the network capacity. We formally prove the convergence of MaxSR. We also show via simulation that the topology induced by MaxSR outperforms that derived from existing topology control algorithms by 50%-110% in terms of maximizing the network capacity
Shortest Path and Distance Queries on Road Networks: An Experimental Evaluation
Computing the shortest path between two given locations in a road network is
an important problem that finds applications in various map services and
commercial navigation products. The state-of-the-art solutions for the problem
can be divided into two categories: spatial-coherence-based methods and
vertex-importance-based approaches. The two categories of techniques, however,
have not been compared systematically under the same experimental framework, as
they were developed from two independent lines of research that do not refer to
each other. This renders it difficult for a practitioner to decide which
technique should be adopted for a specific application. Furthermore, the
experimental evaluation of the existing techniques, as presented in previous
work, falls short in several aspects. Some methods were tested only on small
road networks with up to one hundred thousand vertices; some approaches were
evaluated using distance queries (instead of shortest path queries), namely,
queries that ask only for the length of the shortest path; a state-of-the-art
technique was examined based on a faulty implementation that led to incorrect
query results. To address the above issues, this paper presents a comprehensive
comparison of the most advanced spatial-coherence-based and
vertex-importance-based approaches. Using a variety of real road networks with
up to twenty million vertices, we evaluated each technique in terms of its
preprocessing time, space consumption, and query efficiency (for both shortest
path and distance queries). Our experimental results reveal the characteristics
of different techniques, based on which we provide guidelines on selecting
appropriate methods for various scenarios.Comment: VLDB201
Incremental Maintenance of Maximal Cliques in a Dynamic Graph
We consider the maintenance of the set of all maximal cliques in a dynamic
graph that is changing through the addition or deletion of edges. We present
nearly tight bounds on the magnitude of change in the set of maximal cliques,
as well as the first change-sensitive algorithms for clique maintenance, whose
runtime is proportional to the magnitude of the change in the set of maximal
cliques. We present experimental results showing these algorithms are efficient
in practice and are faster than prior work by two to three orders of magnitude.Comment: 18 pages, 8 figure
- …