42,004 research outputs found

    The Tree Inclusion Problem: In Linear Space and Faster

    Full text link
    Given two rooted, ordered, and labeled trees PP and TT the tree inclusion problem is to determine if PP can be obtained from TT by deleting nodes in TT. This problem has recently been recognized as an important query primitive in XML databases. Kilpel\"ainen and Mannila [\emph{SIAM J. Comput. 1995}] presented the first polynomial time algorithm using quadratic time and space. Since then several improved results have been obtained for special cases when PP and TT have a small number of leaves or small depth. However, in the worst case these algorithms still use quadratic time and space. Let nSn_S, lSl_S, and dSd_S denote the number of nodes, the number of leaves, and the %maximum depth of a tree S∈{P,T}S \in \{P, T\}. In this paper we show that the tree inclusion problem can be solved in space O(nT)O(n_T) and time: O(\min(l_Pn_T, l_Pl_T\log \log n_T + n_T, \frac{n_Pn_T}{\log n_T} + n_{T}\log n_{T})). This improves or matches the best known time complexities while using only linear space instead of quadratic. This is particularly important in practical applications, such as XML databases, where the space is likely to be a bottleneck.Comment: Minor updates from last tim

    Structure-Aware Sampling: Flexible and Accurate Summarization

    Full text link
    In processing large quantities of data, a fundamental problem is to obtain a summary which supports approximate query answering. Random sampling yields flexible summaries which naturally support subset-sum queries with unbiased estimators and well-understood confidence bounds. Classic sample-based summaries, however, are designed for arbitrary subset queries and are oblivious to the structure in the set of keys. The particular structure, such as hierarchy, order, or product space (multi-dimensional), makes range queries much more relevant for most analysis of the data. Dedicated summarization algorithms for range-sum queries have also been extensively studied. They can outperform existing sampling schemes in terms of accuracy on range queries per summary size. Their accuracy, however, rapidly degrades when, as is often the case, the query spans multiple ranges. They are also less flexible - being targeted for range sum queries alone - and are often quite costly to build and use. In this paper we propose and evaluate variance optimal sampling schemes that are structure-aware. These summaries improve over the accuracy of existing structure-oblivious sampling schemes on range queries while retaining the benefits of sample-based summaries: flexible summaries, with high accuracy on both range queries and arbitrary subset queries

    Algorithmic Aspects of a General Modular Decomposition Theory

    Get PDF
    A new general decomposition theory inspired from modular graph decomposition is presented. This helps unifying modular decomposition on different structures, including (but not restricted to) graphs. Moreover, even in the case of graphs, the terminology ``module'' not only captures the classical graph modules but also allows to handle 2-connected components, star-cutsets, and other vertex subsets. The main result is that most of the nice algorithmic tools developed for modular decomposition of graphs still apply efficiently on our generalisation of modules. Besides, when an essential axiom is satisfied, almost all the important properties can be retrieved. For this case, an algorithm given by Ehrenfeucht, Gabow, McConnell and Sullivan 1994 is generalised and yields a very efficient solution to the associated decomposition problem

    A survey on algorithmic aspects of modular decomposition

    Full text link
    The modular decomposition is a technique that applies but is not restricted to graphs. The notion of module naturally appears in the proofs of many graph theoretical theorems. Computing the modular decomposition tree is an important preprocessing step to solve a large number of combinatorial optimization problems. Since the first polynomial time algorithm in the early 70's, the algorithmic of the modular decomposition has known an important development. This paper survey the ideas and techniques that arose from this line of research

    DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets

    Full text link
    Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure

    Computing the Face Lattice of a Polytope from its Vertex-Facet Incidences

    Get PDF
    We give an algorithm that constructs the Hasse diagram of the face lattice of a convex polytope P from its vertex-facet incidences in time O(min{n,m}*a*f), where n is the number of vertices, m is the number of facets, a is the number of vertex-facet incidences, and f is the total number of faces of P. This improves results of Fukuda and Rosta (1994), who described an algorithm for enumerating all faces of a d-polytope in O(min{n,m}*d*f^2) steps. For simple or simplicial d-polytopes our algorithm can be specialized to run in time O(d*a*f). Furthermore, applications of the algorithm to other atomic lattices are discussed, e.g., to face lattices of oriented matroids.Comment: 14 pages; to appear in: Comput. Geom.; the new version contains some minor extensions and corrections as well as a more detailed treatment of oriented matroid

    The Complexity of Computing the Size of an Interval

    Get PDF
    Given a p-order A over a universe of strings (i.e., a transitive, reflexive, antisymmetric relation such that if (x, y) is an element of A then |x| is polynomially bounded by |y|), an interval size function of A returns, for each string x in the universe, the number of strings in the interval between strings b(x) and t(x) (with respect to A), where b(x) and t(x) are functions that are polynomial-time computable in the length of x. By choosing sets of interval size functions based on feasibility requirements for their underlying p-orders, we obtain new characterizations of complexity classes. We prove that the set of all interval size functions whose underlying p-orders are polynomial-time decidable is exactly #P. We show that the interval size functions for orders with polynomial-time adjacency checks are closely related to the class FPSPACE(poly). Indeed, FPSPACE(poly) is exactly the class of all nonnegative functions that are an interval size function minus a polynomial-time computable function. We study two important functions in relation to interval size functions. The function #DIV maps each natural number n to the number of nontrivial divisors of n. We show that #DIV is an interval size function of a polynomial-time decidable partial p-order with polynomial-time adjacency checks. The function #MONSAT maps each monotone boolean formula F to the number of satisfying assignments of F. We show that #MONSAT is an interval size function of a polynomial-time decidable total p-order with polynomial-time adjacency checks. Finally, we explore the related notion of cluster computation.Comment: This revision fixes a problem in the proof of Theorem 9.
    • …
    corecore