Search CORE

9 research outputs found

Improving I/O Bandwidth for Data-Intensive Applications

Author: Zukowski M. (Marcin)
Publication venue
Publication date: 01/01/2005
Field of study

High disk bandwidth in data-intensive applications is usually achieved with expensive hardware solutions consisting of a large number of disks. In this article we present our current work on software methods for improving disk bandwidth in ColumnBM, a new storage system for MonetDB/X100 query execution engine. Two novel techniques are discussed: superscalar compression for standalone queries and cooperative scans for multi-query optimization

CiteSeerX

CWI's Institutional Repository

Método Tres-Pasos para integrar fuertemente tareas de minería de datos en un sistema de base de datos relacional

Author: Timarán-Pereira Ricardo
Publication venue
Publication date: 28/03/2014
Field of study

In this paper, a result of the research project that aimed to define new algebraic operators and new SQL primitives for knowledge discovery in a tightly coupled architecture with a Relational Database Management System (RDBMS) is presented. In order to facilitate the tight coupling and to support the data mining tasks into the RDBMS engine, the three-step approach is proposed. In the first step, the relational algebra is extended with new algebraic operators to facilitate more expensive computationally processes of data mining tasks. In the next step and with the aim that the SQL language is relationally complete, these operators are defined as new primitives in the SELECT clause. In the last step, these primitives are unified into new SQL operator that runs a specific data mining task. Applying this method, new algebraic operators, new SQL primitives and new SQL operators for association and classification tasks were defined and were implemented into the PostgreSQL DBMS engine, giving it the capacity to discover association and classification rules efficiently.En este artículo se presenta uno de los resultados del proyecto de investigación cuyo objetivo fue definir nuevosoperadores algebraicos y nuevas primitivas SQL para el Descubrimiento de Conocimiento en una arquitecturafuertemente acoplada con un Sistema Gestor de Bases de Datos Relacional (SGBDR). Se propone el método trespasoscon el fin de facilitar el acoplamiento fuerte y soportar tareas de minería de datos al interior del motor de unSGBDR. En el primer paso, se extiende el álgebra relacional con nuevos operadores algebraicos que faciliten losprocesos computacionales más costosos de las tareas de minería de datos. En el siguiente paso y con el fin de queel lenguaje SQL sea relacionalmente completo, estos operadores son definidos como nuevas primitivas SQL en lacláusula SELECT. En el último paso, estas primitivas son unificadas en un nuevo operador SQL que ejecuta unatarea específica de minería de datos. Aplicando este método, se definieron nuevos operadores algebraicos, nuevasprimitivas y operadores SQL para las tareas de Asociación y Clasificación y fueron implementados al interiordel motor del SGBD PostgreSQL, dotándolo de la capacidad para descubrir reglas de asociación y clasificacióneficientemente

Biblioteca Digital de la Universidad del Valle

Arquitectura conceptual para combinar los procesos de data warehousing y data mining basada en objetos simbólicos

Author: González Císaro Sandra
Nigro Oscar
Xodo Daniel
Publication venue
Publication date: 26/10/2012
Field of study

Este trabajo presenta una arquitectura conceptual para la combinación de los procesos de Data Warehousing con el Data Mining por medio de objetos simbólicos. En los últimos años, las empresas han recopilado una cantidad muy importante de datos, es deseable organizarlos para coordinar las tareas de análisis con la intención de mejorar los Procesos de Toma de decisiones. La organización de datos es realizada con la implementación de un Data Warehouse. En el cual, la información es seleccionada, limpiada y enriquecida; debido a ello es posible integrar varias fuentes e incluir el conocimiento propio del negocio, también llamado conocimiento contextual. De este punto de vista, extraer el conocimiento potencialmente valioso de los volúmenes masivos de datos coleccionados por sistemas operacionales es un desafío siendo modelado por objetos simbólicos. Los cuales, representan los principales conceptos que definen el negocio u organización. De esta manera, mejoramos la Gestión del Conocimiento, ya que el conocimiento implícito en las mentes de los miembros de la organización es transformado en explicito bajo el formalismo de objetos simbólicos.II Workshop de Ingeniería de Software y Bases de Datos (WISBD)Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Arquitectura conceptual para combinar los procesos de data warehousing y data mining basada en objetos simbólicos

Author: González Císaro Sandra
Nigro Oscar
Xodo Daniel
Publication venue
Publication date: 01/10/2005
Field of study

Cooperative scans

Author: Boncz P.A. (Peter)
Kersten M.L. (Martin)
Zukowski M. (Marcin)
Publication venue: CWI
Publication date: 01/01/2004
Field of study

Data mining, information retrieval and other application areas exhibit a query load with multiple concurrent queries touching a large fraction of a relation. This leads to individual query plans based on a table scan or large index scan. The implementation of this access path in most database systems is straightforward. The Scan operator issues next page requests to the buffer manager without concern for the system state. Conversely, the buffer manager is not aware of the work ahead and it focuses on keeping the most-recently-used pages in the buffer pool. This paper introduces cooperative scans -- a new algorithm, based on a better sharing of knowledge and responsibility between the Scan operator and the buffer manager, which significantly improves performance of concurrent scan queries. In this approach, queries share the buffer content, and progress of the scans is optimized by the buffer manager by minimizing the number of disk transfers in light of the total workload ahead. The experimental results are based on a simulation of the various disk-access scheduling policies, and implementation of the cooperative scans within PostgreSQL and MonetDB/X100. These real-life experiments show that with a little effort the performance of existing database systems on concurrent scan queries can be strongly improve

CWI's Institutional Repository

NonStop SQL/MX primitives for knowledge discovery

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1999
Field of study

Crossref

Lähestymistapoja OLAP-kieliin.

Author: KANERVA KAARLO
Publication venue
Publication date: 20/10/2003
Field of study

Trepo - Institutional Repository of Tampere University

Data mining and database systems: integrating conceptual clustering with a relational database management system.

Author: Lepinioti Konstantina.
Publication venue
Publication date
Field of study

Many clustering algorithms have been developed and improved over the years to cater for large scale data clustering. However, much of this work has been in developing numeric based algorithms that use efficient summarisations to scale to large data sets. There is a growing need for scalable categorical clustering algorithms as, although numeric based algorithms can be adapted to categorical data, they do not always produce good results. This thesis presents a categorical conceptual clustering algorithm that can scale to large data sets using appropriate data summarisations. Data mining is distinguished from machine learning by the use of larger data sets that are often stored in database management systems (DBMSs). Many clustering algorithms require data to be extracted from the DBMS and reformatted for input to the algorithm. This thesis presents an approach that integrates conceptual clustering with a DBMS. The presented approach makes the algorithm main memory independent and supports on-line data mining

Bournemouth University Research Online

Data mining and database systems : integrating conceptual clustering with a relational database management system

Author: Lepinioti Konstantina
Publication venue
Publication date: 01/01/2011
Field of study

OpenGrey Repository