118 research outputs found
Path Queries on Compressed XML
Central to any XML query language is a path language such as XPath which operates on the tree structure of the XML document. We demonstrate in this paper that the tree structure can be e#ectively compressed and manipulated using techniques derived from symbolic model checking . Specifically, we show first that succinct representations of document tree structures based on sharing subtrees are highly e#ective. Second, we show that compressed structures can be queried directly and e#ciently through a process of manipulating selections of nodes and partial decompression
Tractable Optimization Problems through Hypergraph-Based Structural Restrictions
Several variants of the Constraint Satisfaction Problem have been proposed
and investigated in the literature for modelling those scenarios where
solutions are associated with some given costs. Within these frameworks
computing an optimal solution is an NP-hard problem in general; yet, when
restricted over classes of instances whose constraint interactions can be
modelled via (nearly-)acyclic graphs, this problem is known to be solvable in
polynomial time. In this paper, larger classes of tractable instances are
singled out, by discussing solution approaches based on exploiting hypergraph
acyclicity and, more generally, structural decomposition methods, such as
(hyper)tree decompositions
Clustering-Based Materialized View Selection in Data Warehouses
Materialized view selection is a non-trivial task. Hence, its complexity must
be reduced. A judicious choice of views must be cost-driven and influenced by
the workload experienced by the system. In this paper, we propose a framework
for materialized view selection that exploits a data mining technique
(clustering), in order to determine clusters of similar queries. We also
propose a view merging algorithm that builds a set of candidate views, as well
as a greedy process for selecting a set of views to materialize. This selection
is based on cost models that evaluate the cost of accessing data using views
and the cost of storing these views. To validate our strategy, we executed a
workload of decision-support queries on a test data warehouse, with and without
using our strategy. Our experimental results demonstrate its efficiency, even
when storage space is limited
Cut and Paste
AbstractThe paper develops Editor, a language for manipulating semistructured documents, such as those typically available on the Web. Editor programs are based on two simple ideas, taken from text editors: “search” instructions are used to select regions of interest in a document, and “cut & paste” instructions to restructure them. We study the expressive power and the complexity of these programs. We show that they are computationally complete, in the sense that any computable document restructuring can be expressed in Editor. We also study the complexity of a safe subclass of programs, showing that it captures exactly the class of polynomial-time restructurings. The language has been implemented in Java and is currently used in the Araneus project as a basis for a wrapper-generation toolkit
On-line analytical processing in distributed data warehouses
The concepts of 'data warehousing' and 'on-line analytical processing' have seen a growing interest in the research and commercial product community. Today, the trend moves away from complex centralized data warehouses to distributed data marts integrated in a common conceptual schema. However, as the first part of this paper demonstrates, there are many problems and little solutions for large distributed decision support systems in worldwide operating corporations. After showing the benefits and problems of the distributed approach, this paper outlines possibilities for achieving performance in distributed online analytical processing. Finally, the architectural framework of the prototypical distributed OLAP system CUBESTAR is outlined
Greedy Selection of Materialized Views
Greedy based approach for view selection at each step selects a beneficial view that fits within the space available for view materialization. Most of these approaches are focused around the HRU algorithm, which uses a multidimensional lattice framework to determine a good set of views to materialize. The HRU algorithm exhibits high run time complexity as the number of possible views is exponential with respect to the number of dimensions. The PGA algorithm provides a scalable solution to this problem by selecting views for materialization in polynomial time relative to the number of dimensions. This paper compares the HRU and the PGA algorithm. It was experimentally deduced that the PGA algorithm, in comparison with the HRU algorithm, achieves an improved execution time with lowered memory and CPU usages. The HRU algorithm has an edge over the PGA algorithm on the quality of the views selected for materialization
- …