9 research outputs found

    Improving I/O Bandwidth for Data-Intensive Applications

    Get PDF
    High disk bandwidth in data-intensive applications is usually achieved with expensive hardware solutions consisting of a large number of disks. In this article we present our current work on software methods for improving disk bandwidth in ColumnBM, a new storage system for MonetDB/X100 query execution engine. Two novel techniques are discussed: superscalar compression for standalone queries and cooperative scans for multi-query optimization

    M茅todo Tres-Pasos para integrar fuertemente tareas de miner铆a de datos en un sistema de base de datos relacional

    Get PDF
    In this paper, a result of the research project that aimed to define new algebraic operators and new SQL primitives for knowledge discovery in a tightly coupled architecture with a Relational Database Management System (RDBMS) is presented. In order to facilitate the tight coupling and to support the data mining tasks into the RDBMS engine, the three-step approach is proposed. In the first step, the relational algebra is extended with new algebraic operators to facilitate more expensive computationally processes of data mining tasks. In the next step and with the aim that the SQL language is relationally complete, these operators are defined as new primitives in the SELECT clause. In the last step, these primitives are unified into new SQL operator that runs a specific data mining task. Applying this method, new algebraic operators, new SQL primitives and new SQL operators for association and classification tasks were defined and were implemented into the PostgreSQL DBMS engine, giving it the capacity to discover association and classification rules efficiently.En este art铆culo se presenta uno de los resultados del proyecto de investigaci贸n cuyo objetivo fue definir nuevosoperadores algebraicos y nuevas primitivas SQL para el Descubrimiento de Conocimiento en una arquitecturafuertemente acoplada con un Sistema Gestor de Bases de Datos Relacional (SGBDR). Se propone el m茅todo trespasoscon el fin de facilitar el acoplamiento fuerte y soportar tareas de miner铆a de datos al interior del motor de unSGBDR. En el primer paso, se extiende el 谩lgebra relacional con nuevos operadores algebraicos que faciliten losprocesos computacionales m谩s costosos de las tareas de miner铆a de datos. En el siguiente paso y con el fin de queel lenguaje SQL sea relacionalmente completo, estos operadores son definidos como nuevas primitivas SQL en lacl谩usula SELECT. En el 煤ltimo paso, estas primitivas son unificadas en un nuevo operador SQL que ejecuta unatarea espec铆fica de miner铆a de datos. Aplicando este m茅todo, se definieron nuevos operadores algebraicos, nuevasprimitivas y operadores SQL para las tareas de Asociaci贸n y Clasificaci贸n y fueron implementados al interiordel motor del SGBD PostgreSQL, dot谩ndolo de la capacidad para descubrir reglas de asociaci贸n y clasificaci贸neficientemente

    Arquitectura conceptual para combinar los procesos de data warehousing y data mining basada en objetos simb贸licos

    Get PDF
    Este trabajo presenta una arquitectura conceptual para la combinaci贸n de los procesos de Data Warehousing con el Data Mining por medio de objetos simb贸licos. En los 煤ltimos a帽os, las empresas han recopilado una cantidad muy importante de datos, es deseable organizarlos para coordinar las tareas de an谩lisis con la intenci贸n de mejorar los Procesos de Toma de decisiones. La organizaci贸n de datos es realizada con la implementaci贸n de un Data Warehouse. En el cual, la informaci贸n es seleccionada, limpiada y enriquecida; debido a ello es posible integrar varias fuentes e incluir el conocimiento propio del negocio, tambi茅n llamado conocimiento contextual. De este punto de vista, extraer el conocimiento potencialmente valioso de los vol煤menes masivos de datos coleccionados por sistemas operacionales es un desaf铆o siendo modelado por objetos simb贸licos. Los cuales, representan los principales conceptos que definen el negocio u organizaci贸n. De esta manera, mejoramos la Gesti贸n del Conocimiento, ya que el conocimiento impl铆cito en las mentes de los miembros de la organizaci贸n es transformado en explicito bajo el formalismo de objetos simb贸licos.II Workshop de Ingenier铆a de Software y Bases de Datos (WISBD)Red de Universidades con Carreras en Inform谩tica (RedUNCI

    Arquitectura conceptual para combinar los procesos de data warehousing y data mining basada en objetos simb贸licos

    Get PDF
    Este trabajo presenta una arquitectura conceptual para la combinaci贸n de los procesos de Data Warehousing con el Data Mining por medio de objetos simb贸licos. En los 煤ltimos a帽os, las empresas han recopilado una cantidad muy importante de datos, es deseable organizarlos para coordinar las tareas de an谩lisis con la intenci贸n de mejorar los Procesos de Toma de decisiones. La organizaci贸n de datos es realizada con la implementaci贸n de un Data Warehouse. En el cual, la informaci贸n es seleccionada, limpiada y enriquecida; debido a ello es posible integrar varias fuentes e incluir el conocimiento propio del negocio, tambi茅n llamado conocimiento contextual. De este punto de vista, extraer el conocimiento potencialmente valioso de los vol煤menes masivos de datos coleccionados por sistemas operacionales es un desaf铆o siendo modelado por objetos simb贸licos. Los cuales, representan los principales conceptos que definen el negocio u organizaci贸n. De esta manera, mejoramos la Gesti贸n del Conocimiento, ya que el conocimiento impl铆cito en las mentes de los miembros de la organizaci贸n es transformado en explicito bajo el formalismo de objetos simb贸licos.II Workshop de Ingenier铆a de Software y Bases de Datos (WISBD)Red de Universidades con Carreras en Inform谩tica (RedUNCI

    Cooperative scans

    Get PDF
    Data mining, information retrieval and other application areas exhibit a query load with multiple concurrent queries touching a large fraction of a relation. This leads to individual query plans based on a table scan or large index scan. The implementation of this access path in most database systems is straightforward. The Scan operator issues next page requests to the buffer manager without concern for the system state. Conversely, the buffer manager is not aware of the work ahead and it focuses on keeping the most-recently-used pages in the buffer pool. This paper introduces cooperative scans -- a new algorithm, based on a better sharing of knowledge and responsibility between the Scan operator and the buffer manager, which significantly improves performance of concurrent scan queries. In this approach, queries share the buffer content, and progress of the scans is optimized by the buffer manager by minimizing the number of disk transfers in light of the total workload ahead. The experimental results are based on a simulation of the various disk-access scheduling policies, and implementation of the cooperative scans within PostgreSQL and MonetDB/X100. These real-life experiments show that with a little effort the performance of existing database systems on concurrent scan queries can be strongly improve

    NonStop SQL/MX primitives for knowledge discovery

    No full text

    Data mining and database systems: integrating conceptual clustering with a relational database management system.

    Get PDF
    Many clustering algorithms have been developed and improved over the years to cater for large scale data clustering. However, much of this work has been in developing numeric based algorithms that use efficient summarisations to scale to large data sets. There is a growing need for scalable categorical clustering algorithms as, although numeric based algorithms can be adapted to categorical data, they do not always produce good results. This thesis presents a categorical conceptual clustering algorithm that can scale to large data sets using appropriate data summarisations. Data mining is distinguished from machine learning by the use of larger data sets that are often stored in database management systems (DBMSs). Many clustering algorithms require data to be extracted from the DBMS and reformatted for input to the algorithm. This thesis presents an approach that integrates conceptual clustering with a DBMS. The presented approach makes the algorithm main memory independent and supports on-line data mining

    Data mining and database systems : integrating conceptual clustering with a relational database management system

    Get PDF
    Many clustering algorithms have been developed and improved over the years to cater for large scale data clustering. However, much of this work has been in developing numeric based algorithms that use efficient summarisations to scale to large data sets. There is a growing need for scalable categorical clustering algorithms as, although numeric based algorithms can be adapted to categorical data, they do not always produce good results. This thesis presents a categorical conceptual clustering algorithm that can scale to large data sets using appropriate data summarisations. Data mining is distinguished from machine learning by the use of larger data sets that are often stored in database management systems (DBMSs). Many clustering algorithms require data to be extracted from the DBMS and reformatted for input to the algorithm. This thesis presents an approach that integrates conceptual clustering with a DBMS. The presented approach makes the algorithm main memory independent and supports on-line data mining.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore