67,213 research outputs found
Set-Oriented Mining for Association Rules in Relational Databases
Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. SETM uses only simple database primitives, viz. sorting and merge-scan join. SETM is simple, fast and stable over the range of parameter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be carried out by using general query languages such as SQL, rather than by developing specialized black-box algorithms. The set-oriented nature of SETM facilitates the development of extension
Semantic Query Optimization for Bottom-Up Evaluation
Semantic query optimization uses semantic knowledge in databases
(represented in the form of integrity constraints) to rewrite queries and
logic programs for the purpose of more efficient query evaluation. Much
work has been done to develop various techniques for optimization. Most of
it, however, is only applicable to top-down query evaluation strategies.
Moreover, little attention has been paid to the cost of the optimization
itself. In this paper, we address the issue of semantic query optimization
for bottom-up query evaluation strategies with an emphasis on overall
efficiency. We restrict our attention to a single optimization technique,
join elimination. We discuss various factors that influence the cost of
semantic optimization, and present two abstract algorithms for different
optimization approaches. The first one pre-processes a query statically
before it is evaluated; the second approach combines query evaluation with
semantic optimization using heuristics to achieve the largest possible
savings.
(Also cross-referenced as UMIACS-TR-95-109
Set-oriented data mining in relational databases
Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud
\ud
In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases
Algorithm Choice For Multiple-Query Evaluation
Traditional query optimization concentrates on the optimization of the execution of each individual query. More recently, it has been observed that by considering a sequence of multiple queries some additional high-level optimizations can be performed. Once these optimizations have been performed, each operation is translated into executable code. The fundamental insight in this paper is that significant improvements can be gained by careful choice of the algorithm to be used for each operation. This choice is not merely based on efficiency of algorithms for individual operations, but rather on the efficiency of the algorithm choices for the entire multiple-query evaluation. An efficient procedure for automatically optimizing these algorithm choices is given
Query DAGs: A Practical Paradigm for Implementing Belief-Network Inference
We describe a new paradigm for implementing inference in belief networks,
which consists of two steps: (1) compiling a belief network into an arithmetic
expression called a Query DAG (Q-DAG); and (2) answering queries using a simple
evaluation algorithm. Each node of a Q-DAG represents a numeric operation, a
number, or a symbol for evidence. Each leaf node of a Q-DAG represents the
answer to a network query, that is, the probability of some event of interest.
It appears that Q-DAGs can be generated using any of the standard algorithms
for exact inference in belief networks (we show how they can be generated using
clustering and conditioning algorithms). The time and space complexity of a
Q-DAG generation algorithm is no worse than the time complexity of the
inference algorithm on which it is based. The complexity of a Q-DAG evaluation
algorithm is linear in the size of the Q-DAG, and such inference amounts to a
standard evaluation of the arithmetic expression it represents. The intended
value of Q-DAGs is in reducing the software and hardware resources required to
utilize belief networks in on-line, real-world applications. The proposed
framework also facilitates the development of on-line inference on different
software and hardware platforms due to the simplicity of the Q-DAG evaluation
algorithm. Interestingly enough, Q-DAGs were found to serve other purposes:
simple techniques for reducing Q-DAGs tend to subsume relatively complex
optimization techniques for belief-network inference, such as network-pruning
and computation-caching.Comment: See http://www.jair.org/ for any accompanying file
New Database Architecture for Smart Query Handler of Spatial Database
AbstractA spatial database system is a database system with additional capabilities for handling spatial data. It also supports spatial data types in its implementation, providing spatial indexing and efficient algorithms for spatial join. The retrieval of data values from a spatial database involves searching through the huge repository of data, involving huge cost. Thus query optimization on spatial database takes more time as compared to RDBMS. The current state of art shows that during the execution of a query in a spatial database management system (SDBMS), the query optimizer creates all possible query evaluation plans. All plans are equivalent in term of their final output but vary in their execution cost, the amount of time to run. Once the data is retrieved the query and its plans are deleted from memory to free the space for future usage. This is repeated for the next query even if the query is already executed. This leads to increased storage overhead and execution time. In this paper, a new database architecture is proposed, which uses a buffer based query optimization technique for faster data retrieval
Main Memory Implementations for Binary Grouping
An increasing number of applications depend on efficient storage and analysis features for XML data. Hence, query optimization and efficient evaluation techniques for the emerging XQuery standard become more and more important. Many XQuery queries require nested expressions. Unnesting them often introduces binary grouping. We introduce several algorithms implementing binary grouping and analyze their time and space complexity. Experiments demonstrate their performance
- …