Search CORE

28,338 research outputs found

Set-oriented data mining in relational databases

Author: Houtsma Maurice
Swami Arun
Publication venue: North Holland
Publication date: 01/01/1995
Field of study

Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud \ud In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases

CiteSeerX

University of Twente Research Information

A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

Author: Freitas Alex A.
Publication venue: Morgan Kaufmann
Publication date: 01/01/1997
Field of study

This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization

CiteSeerX

Kent Academic Repository

Attribute oriented induction with star schema

Author: H Spits Warnars H. L.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 29/05/2010
Field of study

This paper will propose a novel star schema attribute induction as a new attribute induction paradigm and as improving from current attribute oriented induction. A novel star schema attribute induction will be examined with current attribute oriented induction based on characteristic rule and using non rule based concept hierarchy by implementing both of approaches. In novel star schema attribute induction some improvements have been implemented like elimination threshold number as maximum tuples control for generalization result, there is no ANY as the most general concept, replacement the role concept hierarchy with concept tree, simplification for the generalization strategy steps and elimination attribute oriented induction algorithm. Novel star schema attribute induction is more powerful than the current attribute oriented induction since can produce small number final generalization tuples and there is no ANY in the results.Comment: 23 Pages, IJDM

arXiv.org e-Print Archive

CiteSeerX

Crossref

Using Visualization to Support Data Mining of Large Existing Databases

Author: Keim Daniel A.
Kriegel Hans-Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1994
Field of study

In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database

KOPS - The Institutional Repository of the University of Konstanz

Open Access LMU

Set-Oriented Mining for Association Rules in Relational Databases

Author: Houtsma M.A.W.
Swami A.
Publication venue: IEEE
Publication date: 01/01/1995
Field of study

Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. SETM uses only simple database primitives, viz. sorting and merge-scan join. SETM is simple, fast and stable over the range of parameter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be carried out by using general query languages such as SQL, rather than by developing specialized black-box algorithms. The set-oriented nature of SETM facilitates the development of extension

University of Twente Research Information

An introduction to Graph Data Management

Author: A Dries
A Gutiérrez
A Iosup
A Morari
A Poulovassilis
AD Zhu
AO Mendelzon
B Amann
B Elser
C Berge
C Vicknair
C Watters
C Weiss
CS Chang
D Conte
D Dominguez-Sal
D Theodoratos
DC Faye
DW Shipman
EF Codd
FW Tompa
G Malewicz
GM Kuper
H He
HS Kunii
IF Cruz
IF Cruz
J Hidders
J Paredaens
J Peckham
J. Hidders
Jonathan Hayes
K Zeng
L Kowalik
L Zou
M Atre
M Ciglan
M Consens
M Gemis
M Gyssens
M Han
M Levene
M Levene
M Levene
M Mainguenaud
M Schmidt
M Yannakakis
MA Bornea
MA Rodriguez
MA Rodriguez
Marc Andries
MP Consens
MP Consens
N Kiesel
N Roussopoulos
O Erling
P Barceló Baeza
P Buneman
P Yuan
Philippe Cudré-Mauroux
PPS Chen
PT Wood
PT Wood
R Agrawal
R Angles
R Angles
R Brijder
R Ronen
RH Güting
RS Xin
S Abiteboul
S Abiteboul
T Neumann
W Fan
W Kim
Y Guo
Y Low
Y Papakonstantinou
Y Tian
Y Zhao
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2017
Field of study

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them

arXiv.org e-Print Archive

Crossref