6,265 research outputs found

    A Practically Efficient Algorithm for Generating Answers to Keyword Search over Data Graphs

    Get PDF
    In keyword search over a data graph, an answer is a non-redundant subtree that contains all the keywords of the query. A naive approach to producing all the answers by increasing height is to generalize Dijkstra's algorithm to enumerating all acyclic paths by increasing weight. The idea of freezing is introduced so that (most) non-shortest paths are generated only if they are actually needed for producing answers. The resulting algorithm for generating subtrees, called GTF, is subtle and its proof of correctness is intricate. Extensive experiments show that GTF outperforms existing systems, even ones that for efficiency's sake are incomplete (i.e., cannot produce all the answers). In particular, GTF is scalable and performs well even on large data graphs and when many answers are needed.Comment: Full version of ICDT'16 pape

    Distributed Maximum Matching in Bounded Degree Graphs

    Full text link
    We present deterministic distributed algorithms for computing approximate maximum cardinality matchings and approximate maximum weight matchings. Our algorithm for the unweighted case computes a matching whose size is at least (1-\eps) times the optimal in \Delta^{O(1/\eps)} + O\left(\frac{1}{\eps^2}\right) \cdot\log^*(n) rounds where nn is the number of vertices in the graph and Δ\Delta is the maximum degree. Our algorithm for the edge-weighted case computes a matching whose weight is at least (1-\eps) times the optimal in \log(\min\{1/\wmin,n/\eps\})^{O(1/\eps)}\cdot(\Delta^{O(1/\eps)}+\log^*(n)) rounds for edge-weights in [\wmin,1]. The best previous algorithms for both the unweighted case and the weighted case are by Lotker, Patt-Shamir, and Pettie~(SPAA 2008). For the unweighted case they give a randomized (1-\eps)-approximation algorithm that runs in O((\log(n)) /\eps^3) rounds. For the weighted case they give a randomized (1/2-\eps)-approximation algorithm that runs in O(\log(\eps^{-1}) \cdot \log(n)) rounds. Hence, our results improve on the previous ones when the parameters Δ\Delta, \eps and \wmin are constants (where we reduce the number of runs from O(log(n))O(\log(n)) to O(log(n))O(\log^*(n))), and more generally when Δ\Delta, 1/\eps and 1/\wmin are sufficiently slowly increasing functions of nn. Moreover, our algorithms are deterministic rather than randomized.Comment: arXiv admin note: substantial text overlap with arXiv:1402.379

    TopCom: Index for Shortest Distance Query in Directed Graph

    Get PDF
    Finding shortest distance between two vertices in a graph is an important problem due to its numerous applications in diverse domains, including geo-spatial databases, social network analysis, and information retrieval. Classical algorithms (such as, Dijkstra) solve this problem in polynomial time, but these algorithms cannot provide real-time response for a large number of bursty queries on a large graph. So, indexing based solutions that pre-process the graph for efficiently answering (exactly or approximately) a large number of distance queries in real-time is becoming increasingly popular. Existing solutions have varying performance in terms of index size, index building time, query time, and accuracy. In this work, we propose T OP C OM , a novel indexing-based solution for exactly answering distance queries. Our experiments with two of the existing state-of-the-art methods (IS-Label and TreeMap) show the superiority of T OP C OM over these two methods considering scalability and query time. Besides, indexing of T OP C OM exploits the DAG (directed acyclic graph) structure in the graph, which makes it significantly faster than the existing methods if the SCCs (strongly connected component) of the input graph are relatively small

    Sequences of regressions and their independences

    Full text link
    Ordered sequences of univariate or multivariate regressions provide statistical models for analysing data from randomized, possibly sequential interventions, from cohort or multi-wave panel studies, but also from cross-sectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, derive criteria to read all implied independences of a regression graph and prove criteria for Markov equivalence that is to judge whether two different graphs imply the same set of independence statements. Knowledge of Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.Comment: 43 pages with 17 figures The manuscript is to appear as an invited discussion paper in the journal TES

    Fault-Tolerant, but Paradoxical Path-Finding in Physical and Conceptual Systems

    Full text link
    We report our initial investigations into reliability and path-finding based models and propose future areas of interest. Inspired by broken sidewalks during on-campus construction projects, we develop two models for navigating this "unreliable network." These are based on a concept of "accumulating risk" backward from the destination, and both operate on directed acyclic graphs with a probability of failure associated with each edge. The first serves to introduce and has faults addressed by the second, more conservative model. Next, we show a paradox when these models are used to construct polynomials on conceptual networks, such as design processes and software development life cycles. When the risk of a network increases uniformly, the most reliable path changes from wider and longer to shorter and narrower. If we let professional inexperience--such as with entry level cooks and software developers--represent probability of edge failure, does this change in path imply that the novice should follow instructions with fewer "back-up" plans, yet those with alternative routes should be followed by the expert?Comment: 8 page
    corecore