948 research outputs found

    Conditional Reliability in Uncertain Graphs

    Full text link
    Network reliability is a well-studied problem that requires to measure the probability that a target node is reachable from a source node in a probabilistic (or uncertain) graph, i.e., a graph where every edge is assigned a probability of existence. Many approaches and problem variants have been considered in the literature, all assuming that edge-existence probabilities are fixed. Nevertheless, in real-world graphs, edge probabilities typically depend on external conditions. In metabolic networks a protein can be converted into another protein with some probability depending on the presence of certain enzymes. In social influence networks the probability that a tweet of some user will be re-tweeted by her followers depends on whether the tweet contains specific hashtags. In transportation networks the probability that a network segment will work properly or not might depend on external conditions such as weather or time of the day. In this paper we overcome this limitation and focus on conditional reliability, that is assessing reliability when edge-existence probabilities depend on a set of conditions. In particular, we study the problem of determining the k conditions that maximize the reliability between two nodes. We deeply characterize our problem and show that, even employing polynomial-time reliability-estimation methods, it is NP-hard, does not admit any PTAS, and the underlying objective function is non-submodular. We then devise a practical method that targets both accuracy and efficiency. We also study natural generalizations of the problem with multiple source and target nodes. An extensive empirical evaluation on several large, real-life graphs demonstrates effectiveness and scalability of the proposed methods.Comment: 14 pages, 13 figure

    Improved Graph Indexing Algorithms for Label-Constrained Reachability Queries

    Get PDF
    Nowadays graph data have become absolutely ubiquitous in various applications starting from social/road networks to bio-medical data etc. Given such graph data, a reachability query asks if there exists a path from a source vertex to a target vertex in the graph. Due to its immense implications in both theory and applied domains, this query and many of its variants have been extensively studied in the literature. One such variant investigates the reachability between two vertices in an edge-labeled graph while constraining the label set simultaneously. This problem has recently been addressed by Valstar et al. [SIGMOD'17] who proposed an approach called the landmark indexing (LI) to support faster label-constrained reachability (LCR) queries. In this work, we introduce a simple, practical and space-e?cient solution for answering LCR queries even faster. The experimental evaluation shows signi?cant time and space e?ciency bene?ts of our proposed solution over the LI approach for this problem in both real-world and synthetic graphs

    Answering Regular Path Queries on Workflow Provenance

    Full text link
    This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach

    Tabled CLP for Reasoning Over Stream Data

    Get PDF
    The interest in reasoning over stream data is growing as quickly as the amount of data generated. Our intention is to change the way stream data is analyzed. This is an important problem because we constantly have new sensors collecting information, new events from electronic devices and/or from customers and we want to reason about this information. For example, information about traffic jams and costumer order could be used to define a deliverer route. When there is a new order or a new traffic jam, we usually restart from scratch in order to recompute the route. However, if we have several deliveries and we analyze the information from thousands of sensors, we would like to reduce the computation requirements, e.g. reusing results from the previous computation. Nowadays, most of the applications that analyze stream data are specialized for specific problems (using complex algorithms and heuristics) and combine a computation language with a query language. As a result, when the problems become more complex (in e.g. reasoning requirements), in order to modify the application complex and error prone coding is required. We propose a framework based on a high-level language rooted in logic and constraints that will be able to provide customized services to different problems. The framework will discard wrong solutions in early stages and will reuse previous results that are still consistent with the current data set. The use of a constraint logic programming language will make it easier to translate the problem requirements into the code and will minimize the amount of re-engineering needed to comply with the requirements when they change

    Two- versus three-dimensional connectivity testing of first-order queries to semi-algebraic sets

    Full text link
    This paper addresses the question whether one can determine the connectivity of a semi-algebraic set in three dimensions by testing the connectivity of a finite number of two-dimensional ``samples'' of the set, where these samples are defined by first-order queries. The question is answered negatively for two classes of first-order queries: cartesian-product-free, and positive one-pass.Comment: corrected minor confusion in Proof of Theorem

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods
    • …
    corecore