536 research outputs found

    Método Tres-Pasos para integrar fuertemente tareas de minería de datos en un sistema de base de datos relacional

    Get PDF
    In this paper, a result of the research project that aimed to define new algebraic operators and new SQL primitives for knowledge discovery in a tightly coupled architecture with a Relational Database Management System (RDBMS) is presented. In order to facilitate the tight coupling and to support the data mining tasks into the RDBMS engine, the three-step approach is proposed. In the first step, the relational algebra is extended with new algebraic operators to facilitate more expensive computationally processes of data mining tasks. In the next step and with the aim that the SQL language is relationally complete, these operators are defined as new primitives in the SELECT clause. In the last step, these primitives are unified into new SQL operator that runs a specific data mining task. Applying this method, new algebraic operators, new SQL primitives and new SQL operators for association and classification tasks were defined and were implemented into the PostgreSQL DBMS engine, giving it the capacity to discover association and classification rules efficiently.En este artículo se presenta uno de los resultados del proyecto de investigación cuyo objetivo fue definir nuevosoperadores algebraicos y nuevas primitivas SQL para el Descubrimiento de Conocimiento en una arquitecturafuertemente acoplada con un Sistema Gestor de Bases de Datos Relacional (SGBDR). Se propone el método trespasoscon el fin de facilitar el acoplamiento fuerte y soportar tareas de minería de datos al interior del motor de unSGBDR. En el primer paso, se extiende el álgebra relacional con nuevos operadores algebraicos que faciliten losprocesos computacionales más costosos de las tareas de minería de datos. En el siguiente paso y con el fin de queel lenguaje SQL sea relacionalmente completo, estos operadores son definidos como nuevas primitivas SQL en lacláusula SELECT. En el último paso, estas primitivas son unificadas en un nuevo operador SQL que ejecuta unatarea específica de minería de datos. Aplicando este método, se definieron nuevos operadores algebraicos, nuevasprimitivas y operadores SQL para las tareas de Asociación y Clasificación y fueron implementados al interiordel motor del SGBD PostgreSQL, dotándolo de la capacidad para descubrir reglas de asociación y clasificacióneficientemente

    A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

    Get PDF
    This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization

    Declarative Data Analytics: a Survey

    Full text link
    The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in those languages. The execution engine can be either centralized or distributed, as the declarative paradigm advocates independence from particular physical implementations. The survey explores a wide range of declarative data analysis frameworks by examining both the programming model and the optimization techniques used, in order to provide conclusions on the current state of the art in the area and identify open challenges.Comment: 36 pages, 2 figure

    SciQL, Bridging the Gap between Science and Relational DBMS

    Get PDF
    Scientific discoveries increasingly rely on the ability to efficiently grind massive amounts of experimental data using database technologies. To bridge the gap between the needs of the Data-Intensive Research fields and the current DBMS technologies, we propose SciQL (pronounced as ‘cycle’), the first SQL-based query language for scientific applications with both tables and arrays as first class citizens. It provides a seamless symbiosis of array-, set- and sequence- interpretations. A key innovation is the extension of value-based grouping of SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between elements positions. This leads to a generalisation of window-based query processing with wide applicability in science domains. This paper describes the main language features of SciQL and illustrates it using time-series concepts

    Data mining query language design and implementation.

    Get PDF
    Xiaolei Yuan.Thesis submitted in: December 2003.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 95-101).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.1.1 --- Data Mining: A New Wave of Database Applications --- p.1Chapter 1.1.2 --- Association Rule Mining --- p.4Chapter 1.2 --- Motivation --- p.7Chapter 1.3 --- Main Contribution --- p.8Chapter 1.4 --- Thesis Organization --- p.9Chapter 2 --- Literature Review --- p.10Chapter 2.1 --- Data mining and association rule mining --- p.10Chapter 2.2 --- Integration data mining with DBMS --- p.11Chapter 2.3 --- Query language design for association rule mining --- p.12Chapter 2.4 --- Unified data mining models --- p.15Chapter 2.5 --- Other topics --- p.15Chapter 3 --- A New Data Mining Query Language M2MQL --- p.17Chapter 3.1 --- Simple item-based association rule --- p.18Chapter 3.1.1 --- One rule set --- p.19Chapter 3.1.2 --- Rule set and Source data set --- p.22Chapter 3.1.3 --- New rule sets from existing ones --- p.24Chapter 3.2 --- Generalized item-based association rules --- p.25Chapter 3.3 --- CREATE RULE and SELECT RULE Primitive --- p.32Chapter 4 --- The Algebra in M2MQL --- p.33Chapter 4.1 --- Review of nested relations --- p.33Chapter 4.1.1 --- Concepts of nested relation --- p.34Chapter 4.1.2 --- Nested relation and association rule mining --- p.35Chapter 4.2 --- Nested relational algebra --- p.36Chapter 4.3 --- Specific data mining algebra --- p.39Chapter 4.3.1 --- POWERSET p --- p.40Chapter 4.3.2 --- SET-CONTAINMENT-JOIN xc --- p.40Chapter 4.3.3 --- Functional operators --- p.42Chapter 5 --- Mining On Top of M2MQL --- p.50Chapter 5.1 --- Problem statement --- p.50Chapter 5.2 --- Frequency Counting Phase --- p.52Chapter 5.3 --- Frequent Itemset Generation Phase --- p.54Chapter 5.4 --- Rule Generation Phase --- p.57Chapter 5.5 --- Summary --- p.64Chapter 6 --- Conclusions and Future Work --- p.65Chapter 6.1 --- What we have achieved --- p.65Chapter 6.2 --- What is ahead --- p.66Chapter 6.2.1 --- Issues of Query Optimization --- p.66Chapter 6.2.2 --- Issues of Expanding Table Forms --- p.67Chapter A --- General Syntax of M2MQL --- p.68Chapter B --- Syntax and Example for MSQL --- p.71Chapter B.1 --- Syntax of MSQL --- p.71Chapter B.2 --- Example --- p.73Chapter C --- Syntax and Example for MINE RULE --- p.76Chapter C.1 --- syntax of MINE RULE --- p.76Chapter C.2 --- Example --- p.77Chapter C.2.1 --- Counting Groups --- p.78Chapter C.2.2 --- Making Couples of Clusters --- p.79Chapter C.2.3 --- Extracting Bodies --- p.80Chapter C.2.4 --- Extracting Rules --- p.80Bibliography --- p.8

    RAM: array processing over a relational DBMS

    Get PDF
    Developing multimedia applications in relational databases is hindered by a mismatch in computational frameworks. Efficient manipulation of multimedia data calls for array-based processing, which at best is available as a database add-on, not supported by the query optimizer. As a result, array-based processing ends up in dedicated programs outside the DBMS: non-reusable black boxes. The goal of our research is to reduce this gap between user-needs and system functionality by developing a seemless integration of array processing in a relational algebra engine. The paper introduces a declarative language for array-expressions based on the array comprehension, and its mapping to a relational kernel in a prototype implementation. The layered architecture of the resulting array database management system allows the use of structural knowledge available in the array data type. This additional source of information can be exploited for query optimization, which is demonstrated with a case study. The experiments show how the performance of a standard tool for matrix computations can be achieved without sacrificing data independence, highlighting however a critical aspect in the DBMS architecture proposed
    • …
    corecore