42,004 research outputs found
The Tree Inclusion Problem: In Linear Space and Faster
Given two rooted, ordered, and labeled trees and the tree inclusion
problem is to determine if can be obtained from by deleting nodes in
. This problem has recently been recognized as an important query primitive
in XML databases. Kilpel\"ainen and Mannila [\emph{SIAM J. Comput. 1995}]
presented the first polynomial time algorithm using quadratic time and space.
Since then several improved results have been obtained for special cases when
and have a small number of leaves or small depth. However, in the worst
case these algorithms still use quadratic time and space. Let , , and
denote the number of nodes, the number of leaves, and the %maximum depth
of a tree . In this paper we show that the tree inclusion
problem can be solved in space and time: O(\min(l_Pn_T, l_Pl_T\log
\log n_T + n_T, \frac{n_Pn_T}{\log n_T} + n_{T}\log n_{T})). This improves or
matches the best known time complexities while using only linear space instead
of quadratic. This is particularly important in practical applications, such as
XML databases, where the space is likely to be a bottleneck.Comment: Minor updates from last tim
Structure-Aware Sampling: Flexible and Accurate Summarization
In processing large quantities of data, a fundamental problem is to obtain a
summary which supports approximate query answering. Random sampling yields
flexible summaries which naturally support subset-sum queries with unbiased
estimators and well-understood confidence bounds.
Classic sample-based summaries, however, are designed for arbitrary subset
queries and are oblivious to the structure in the set of keys. The particular
structure, such as hierarchy, order, or product space (multi-dimensional),
makes range queries much more relevant for most analysis of the data.
Dedicated summarization algorithms for range-sum queries have also been
extensively studied. They can outperform existing sampling schemes in terms of
accuracy on range queries per summary size. Their accuracy, however, rapidly
degrades when, as is often the case, the query spans multiple ranges. They are
also less flexible - being targeted for range sum queries alone - and are often
quite costly to build and use.
In this paper we propose and evaluate variance optimal sampling schemes that
are structure-aware. These summaries improve over the accuracy of existing
structure-oblivious sampling schemes on range queries while retaining the
benefits of sample-based summaries: flexible summaries, with high accuracy on
both range queries and arbitrary subset queries
Algorithmic Aspects of a General Modular Decomposition Theory
A new general decomposition theory inspired from modular graph decomposition
is presented. This helps unifying modular decomposition on different
structures, including (but not restricted to) graphs. Moreover, even in the
case of graphs, the terminology ``module'' not only captures the classical
graph modules but also allows to handle 2-connected components, star-cutsets,
and other vertex subsets. The main result is that most of the nice algorithmic
tools developed for modular decomposition of graphs still apply efficiently on
our generalisation of modules. Besides, when an essential axiom is satisfied,
almost all the important properties can be retrieved. For this case, an
algorithm given by Ehrenfeucht, Gabow, McConnell and Sullivan 1994 is
generalised and yields a very efficient solution to the associated
decomposition problem
A survey on algorithmic aspects of modular decomposition
The modular decomposition is a technique that applies but is not restricted
to graphs. The notion of module naturally appears in the proofs of many graph
theoretical theorems. Computing the modular decomposition tree is an important
preprocessing step to solve a large number of combinatorial optimization
problems. Since the first polynomial time algorithm in the early 70's, the
algorithmic of the modular decomposition has known an important development.
This paper survey the ideas and techniques that arose from this line of
research
DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets
Mining frequent itemsets is an essential problem in data mining and plays an
important role in many data mining applications. In recent years, some itemset
representations based on node sets have been proposed, which have shown to be
very efficient for mining frequent itemsets. In this paper, we propose
DiffNodeset, a novel and more efficient itemset representation, for mining
frequent itemsets. Based on the DiffNodeset structure, we present an efficient
algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency,
dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search
strategy and directly enumerates frequent itemsets without candidate generation
under some case. For evaluating the performance of dFIN, we have conduct
extensive experiments to compare it against with existing leading algorithms on
a variety of real and synthetic datasets. The experimental results show that
dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure
Computing the Face Lattice of a Polytope from its Vertex-Facet Incidences
We give an algorithm that constructs the Hasse diagram of the face lattice of
a convex polytope P from its vertex-facet incidences in time O(min{n,m}*a*f),
where n is the number of vertices, m is the number of facets, a is the number
of vertex-facet incidences, and f is the total number of faces of P. This
improves results of Fukuda and Rosta (1994), who described an algorithm for
enumerating all faces of a d-polytope in O(min{n,m}*d*f^2) steps. For simple or
simplicial d-polytopes our algorithm can be specialized to run in time
O(d*a*f). Furthermore, applications of the algorithm to other atomic lattices
are discussed, e.g., to face lattices of oriented matroids.Comment: 14 pages; to appear in: Comput. Geom.; the new version contains some
minor extensions and corrections as well as a more detailed treatment of
oriented matroid
The Complexity of Computing the Size of an Interval
Given a p-order A over a universe of strings (i.e., a transitive, reflexive,
antisymmetric relation such that if (x, y) is an element of A then |x| is
polynomially bounded by |y|), an interval size function of A returns, for each
string x in the universe, the number of strings in the interval between strings
b(x) and t(x) (with respect to A), where b(x) and t(x) are functions that are
polynomial-time computable in the length of x.
By choosing sets of interval size functions based on feasibility requirements
for their underlying p-orders, we obtain new characterizations of complexity
classes. We prove that the set of all interval size functions whose underlying
p-orders are polynomial-time decidable is exactly #P. We show that the interval
size functions for orders with polynomial-time adjacency checks are closely
related to the class FPSPACE(poly). Indeed, FPSPACE(poly) is exactly the class
of all nonnegative functions that are an interval size function minus a
polynomial-time computable function.
We study two important functions in relation to interval size functions. The
function #DIV maps each natural number n to the number of nontrivial divisors
of n. We show that #DIV is an interval size function of a polynomial-time
decidable partial p-order with polynomial-time adjacency checks. The function
#MONSAT maps each monotone boolean formula F to the number of satisfying
assignments of F. We show that #MONSAT is an interval size function of a
polynomial-time decidable total p-order with polynomial-time adjacency checks.
Finally, we explore the related notion of cluster computation.Comment: This revision fixes a problem in the proof of Theorem 9.
- …