Search CORE

61 research outputs found

Post-processing of association rules.

Author: Baesens Bart
Vanthienen Jan
Viaene Stijn
Publication venue
Publication date
Field of study

In this paper, we situate and motivate the need for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases process. Major research effort has already been devoted to optimising the initially proposed mining algorithms. When it comes to effectively extrapolating the most interesting knowledge nuggets from the standard output of these algorithms, one is faced with an extreme challenge, since it is not uncommon to be confronted with a vast amount of association rules after running the algorithms. The sheer multitude of generated rules often clouds the perception of the interpreters. Rightful assessment of the usefulness of the generated output introduces the need to effectively deal with different forms of data redundancy and data being plainly uninteresting. In order to do so, we will give a tentative overview of some of the main post-processing tasks, taking into account the efforts that have already been reported in the literature.

Research Papers in Economics

Mining Closed Itemsets for Coherent Rules: An Inference Analysis Approach

Author: Mr. Kalli Srinivasa Nageswara Prasad
Prof. S.Ramakrishna
Publication venue: Global Journals Inc. (US)
Publication date: 06/10/2011
Field of study

Past observations have shown that a frequent item set mining algorithm are alleged to mine the closed ones because the finish offers a compact and a whole progress set and higher potency. Anyhow, the most recent closed item set mining algorithms works with candidate maintenance combined with check paradigm that is dear in runtime likewise as area usage when support threshold is a smaller amount or the item sets gets long. Here, we show, PEPP with inference analysis that could be a capable approach used for mining closed sequences for coherent rules while not candidate. It implements a unique sequence closure checking format with inference analysis that based mostly on Sequence Graph protruding by an approach labeled Parallel Edge projection and pruning in brief will refer as PEPP. We describe a novel inference analysis approach to prune patterns that tends to derive coherent rules. A whole observation having sparse and dense real-life information sets proved that PEPP with inference analysis performs larger compared to older algorithms because it takes low memory and is quicker than any algorithms those cited in literature frequently

Global Journal of Computer Science and Technology (GJCST)

CURIO: A fast outlier and outlier cluster detection algorithm for large datasets

Author: Ceglar Aaron John
Powers David Martin
Roddick John Francis
Publication venue: Australian Computer Society
Publication date: 01/01/2007
Field of study

Sydney, NS

Flinders Academic Commons

Método Tres-Pasos para integrar fuertemente tareas de minería de datos en un sistema de base de datos relacional

Author: Timarán-Pereira Ricardo
Publication venue
Publication date: 28/03/2014
Field of study

In this paper, a result of the research project that aimed to define new algebraic operators and new SQL primitives for knowledge discovery in a tightly coupled architecture with a Relational Database Management System (RDBMS) is presented. In order to facilitate the tight coupling and to support the data mining tasks into the RDBMS engine, the three-step approach is proposed. In the first step, the relational algebra is extended with new algebraic operators to facilitate more expensive computationally processes of data mining tasks. In the next step and with the aim that the SQL language is relationally complete, these operators are defined as new primitives in the SELECT clause. In the last step, these primitives are unified into new SQL operator that runs a specific data mining task. Applying this method, new algebraic operators, new SQL primitives and new SQL operators for association and classification tasks were defined and were implemented into the PostgreSQL DBMS engine, giving it the capacity to discover association and classification rules efficiently.En este artículo se presenta uno de los resultados del proyecto de investigación cuyo objetivo fue definir nuevosoperadores algebraicos y nuevas primitivas SQL para el Descubrimiento de Conocimiento en una arquitecturafuertemente acoplada con un Sistema Gestor de Bases de Datos Relacional (SGBDR). Se propone el método trespasoscon el fin de facilitar el acoplamiento fuerte y soportar tareas de minería de datos al interior del motor de unSGBDR. En el primer paso, se extiende el álgebra relacional con nuevos operadores algebraicos que faciliten losprocesos computacionales más costosos de las tareas de minería de datos. En el siguiente paso y con el fin de queel lenguaje SQL sea relacionalmente completo, estos operadores son definidos como nuevas primitivas SQL en lacláusula SELECT. En el último paso, estas primitivas son unificadas en un nuevo operador SQL que ejecuta unatarea específica de minería de datos. Aplicando este método, se definieron nuevos operadores algebraicos, nuevasprimitivas y operadores SQL para las tareas de Asociación y Clasificación y fueron implementados al interiordel motor del SGBD PostgreSQL, dotándolo de la capacidad para descubrir reglas de asociación y clasificacióneficientemente

Biblioteca Digital de la Universidad del Valle

Marking time in sequence mining

Author: Mooney Carl Howard
Roddick John Francis
Publication venue: Australian Computer Society
Publication date: 01/01/2006
Field of study

Sequence mining is often conducted over static and temporal datasets as well as over collections of events (episodes). More recently, there has also been a focus on the mining of streaming data. However, while many sequences are associated with absolute time values, most sequence mining routines treat time in a relative sense, only returning patterns that can be described in terms of Allen-style relationships (or simpler). In this work we investigate the accommodation of timing marks within the sequence mining process. The paper discusses the opportunities presented and the problems that may be encountered and presents a novel algorithm, INTEMTM, that provides support for timing marks. This enables sequences to be examined not only in respect of the order and occurrence of tokens but also in terms of pace. Algorithmic considerations are discussed and an example provided for the case of polled sensor data.Sydney, NS

CiteSeerX

Flinders Academic Commons