Search CORE

536 research outputs found

Método Tres-Pasos para integrar fuertemente tareas de minería de datos en un sistema de base de datos relacional

Author: Timarán-Pereira Ricardo
Publication venue
Publication date: 28/03/2014
Field of study

In this paper, a result of the research project that aimed to define new algebraic operators and new SQL primitives for knowledge discovery in a tightly coupled architecture with a Relational Database Management System (RDBMS) is presented. In order to facilitate the tight coupling and to support the data mining tasks into the RDBMS engine, the three-step approach is proposed. In the first step, the relational algebra is extended with new algebraic operators to facilitate more expensive computationally processes of data mining tasks. In the next step and with the aim that the SQL language is relationally complete, these operators are defined as new primitives in the SELECT clause. In the last step, these primitives are unified into new SQL operator that runs a specific data mining task. Applying this method, new algebraic operators, new SQL primitives and new SQL operators for association and classification tasks were defined and were implemented into the PostgreSQL DBMS engine, giving it the capacity to discover association and classification rules efficiently.En este artículo se presenta uno de los resultados del proyecto de investigación cuyo objetivo fue definir nuevosoperadores algebraicos y nuevas primitivas SQL para el Descubrimiento de Conocimiento en una arquitecturafuertemente acoplada con un Sistema Gestor de Bases de Datos Relacional (SGBDR). Se propone el método trespasoscon el fin de facilitar el acoplamiento fuerte y soportar tareas de minería de datos al interior del motor de unSGBDR. En el primer paso, se extiende el álgebra relacional con nuevos operadores algebraicos que faciliten losprocesos computacionales más costosos de las tareas de minería de datos. En el siguiente paso y con el fin de queel lenguaje SQL sea relacionalmente completo, estos operadores son definidos como nuevas primitivas SQL en lacláusula SELECT. En el último paso, estas primitivas son unificadas en un nuevo operador SQL que ejecuta unatarea específica de minería de datos. Aplicando este método, se definieron nuevos operadores algebraicos, nuevasprimitivas y operadores SQL para las tareas de Asociación y Clasificación y fueron implementados al interiordel motor del SGBD PostgreSQL, dotándolo de la capacidad para descubrir reglas de asociación y clasificacióneficientemente

Biblioteca Digital de la Universidad del Valle

A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

Author: Freitas Alex A.
Publication venue: Morgan Kaufmann
Publication date: 01/01/1997
Field of study

This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization

CiteSeerX

Kent Academic Repository

Declarative Data Analytics: a Survey

Author: Makrynioti Nantia
Vassalos Vasilis
Publication venue
Publication date: 01/01/2004
Field of study

The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in those languages. The execution engine can be either centralized or distributed, as the declarative paradigm advocates independence from particular physical implementations. The survey explores a wide range of declarative data analysis frameworks by examining both the programming model and the optimization techniques used, in order to provide conclusions on the current state of the art in the area and identify open challenges.Comment: 36 pages, 2 figure

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

SciQL, Bridging the Gap between Science and Relational DBMS

Author: Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Nes N.J. (Niels)
Zhang Y. (Ying)
Publication venue: ACM New York, NY, USA
Publication date: 01/09/2011
Field of study

Scientific discoveries increasingly rely on the ability to efficiently grind massive amounts of experimental data using database technologies. To bridge the gap between the needs of the Data-Intensive Research fields and the current DBMS technologies, we propose SciQL (pronounced as ‘cycle’), the first SQL-based query language for scientific applications with both tables and arrays as first class citizens. It provides a seamless symbiosis of array-, set- and sequence- interpretations. A key innovation is the extension of value-based grouping of SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between elements positions. This leads to a generalisation of window-based query processing with wide applicability in science domains. This paper describes the main language features of SciQL and illustrates it using time-series concepts

CWI's Institutional Repository

An introduction to Graph Data Management

Author: A Dries
A Gutiérrez
A Iosup
A Morari
A Poulovassilis
AD Zhu
AO Mendelzon
B Amann
B Elser
C Berge
C Vicknair
C Watters
C Weiss
CS Chang
D Conte
D Dominguez-Sal
D Theodoratos
DC Faye
DW Shipman
EF Codd
FW Tompa
G Malewicz
GM Kuper
H He
HS Kunii
IF Cruz
IF Cruz
J Hidders
J Paredaens
J Peckham
J. Hidders
Jonathan Hayes
K Zeng
L Kowalik
L Zou
M Atre
M Ciglan
M Consens
M Gemis
M Gyssens
M Han
M Levene
M Levene
M Levene
M Mainguenaud
M Schmidt
M Yannakakis
MA Bornea
MA Rodriguez
MA Rodriguez
Marc Andries
MP Consens
MP Consens
N Kiesel
N Roussopoulos
O Erling
P Barceló Baeza
P Buneman
P Yuan
Philippe Cudré-Mauroux
PPS Chen
PT Wood
PT Wood
R Agrawal
R Angles
R Angles
R Brijder
R Ronen
RH Güting
RS Xin
S Abiteboul
S Abiteboul
T Neumann
W Fan
W Kim
Y Guo
Y Low
Y Papakonstantinou
Y Tian
Y Zhao
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2017
Field of study

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them

arXiv.org e-Print Archive

Crossref

Data mining query language design and implementation.

Author
Publication venue
Publication date: 01/01/2004
Field of study

Xiaolei Yuan.Thesis submitted in: December 2003.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 95-101).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.1.1 --- Data Mining: A New Wave of Database Applications --- p.1Chapter 1.1.2 --- Association Rule Mining --- p.4Chapter 1.2 --- Motivation --- p.7Chapter 1.3 --- Main Contribution --- p.8Chapter 1.4 --- Thesis Organization --- p.9Chapter 2 --- Literature Review --- p.10Chapter 2.1 --- Data mining and association rule mining --- p.10Chapter 2.2 --- Integration data mining with DBMS --- p.11Chapter 2.3 --- Query language design for association rule mining --- p.12Chapter 2.4 --- Unified data mining models --- p.15Chapter 2.5 --- Other topics --- p.15Chapter 3 --- A New Data Mining Query Language M2MQL --- p.17Chapter 3.1 --- Simple item-based association rule --- p.18Chapter 3.1.1 --- One rule set --- p.19Chapter 3.1.2 --- Rule set and Source data set --- p.22Chapter 3.1.3 --- New rule sets from existing ones --- p.24Chapter 3.2 --- Generalized item-based association rules --- p.25Chapter 3.3 --- CREATE RULE and SELECT RULE Primitive --- p.32Chapter 4 --- The Algebra in M2MQL --- p.33Chapter 4.1 --- Review of nested relations --- p.33Chapter 4.1.1 --- Concepts of nested relation --- p.34Chapter 4.1.2 --- Nested relation and association rule mining --- p.35Chapter 4.2 --- Nested relational algebra --- p.36Chapter 4.3 --- Specific data mining algebra --- p.39Chapter 4.3.1 --- POWERSET p --- p.40Chapter 4.3.2 --- SET-CONTAINMENT-JOIN xc --- p.40Chapter 4.3.3 --- Functional operators --- p.42Chapter 5 --- Mining On Top of M2MQL --- p.50Chapter 5.1 --- Problem statement --- p.50Chapter 5.2 --- Frequency Counting Phase --- p.52Chapter 5.3 --- Frequent Itemset Generation Phase --- p.54Chapter 5.4 --- Rule Generation Phase --- p.57Chapter 5.5 --- Summary --- p.64Chapter 6 --- Conclusions and Future Work --- p.65Chapter 6.1 --- What we have achieved --- p.65Chapter 6.2 --- What is ahead --- p.66Chapter 6.2.1 --- Issues of Query Optimization --- p.66Chapter 6.2.2 --- Issues of Expanding Table Forms --- p.67Chapter A --- General Syntax of M2MQL --- p.68Chapter B --- Syntax and Example for MSQL --- p.71Chapter B.1 --- Syntax of MSQL --- p.71Chapter B.2 --- Example --- p.73Chapter C --- Syntax and Example for MINE RULE --- p.76Chapter C.1 --- syntax of MINE RULE --- p.76Chapter C.2 --- Example --- p.77Chapter C.2.1 --- Counting Groups --- p.78Chapter C.2.2 --- Making Couples of Clusters --- p.79Chapter C.2.3 --- Extracting Bodies --- p.80Chapter C.2.4 --- Extracting Rules --- p.80Bibliography --- p.8

CUHK Digital Repository

RAM: array processing over a relational DBMS

Author: Ballegooij A.R. van
Kersten M.L. (Martin)
Vries A.P. (Arjen) de
Publication venue: CWI
Publication date: 01/01/2003
Field of study

Developing multimedia applications in relational databases is hindered by a mismatch in computational frameworks. Efficient manipulation of multimedia data calls for array-based processing, which at best is available as a database add-on, not supported by the query optimizer. As a result, array-based processing ends up in dedicated programs outside the DBMS: non-reusable black boxes. The goal of our research is to reduce this gap between user-needs and system functionality by developing a seemless integration of array processing in a relational algebra engine. The paper introduces a declarative language for array-expressions based on the array comprehension, and its mapping to a relational kernel in a prototype implementation. The layered architecture of the resulting array database management system allows the use of structural knowledge available in the array data type. This additional source of information can be exploited for query optimization, which is demonstrated with a case study. The experiments show how the performance of a standard tool for matrix computations can be achieved without sacrificing data independence, highlighting however a critical aspect in the DBMS architecture proposed

CWI's Institutional Repository