Search CORE

14 research outputs found

Automated Construction of Relational Attributes ACORA: A Progress Report

Author: Perlich Claudia
Publication venue: Stern School of Business, New York University
Publication date: 01/08/2002
Field of study

Data mining research has not only development a large number of algorithms, but also enhanced our knowledge and understanding of their applicability and performance. However, the application of data mining technology in business environments is still no very common, despite the fact that organizations have access to large amounts of data and make decisions that could profit from data mining on a daily basis. One of the reasons is the mismatch between data representation for data storage and data analysis. Data are most commonly stored in multi-table relational databases whereas data mining methods require that the data be represented as a simple feature vector. This work presents a general framework for feature construction from multiple relational tables for data mining applications. The second part describes our prototype implementation ACORA (Automated Construction of Relational Features).Information Systems Working Papers Serie

New York University Faculty Digital Archive

A MODULAR APPROACH TO RELATIONAL DATA MINING

Author: Perlich Claudia
Provost Foster
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2002
Field of study

AIS Electronic Library (AISeL)

Multi-relational data mining

Author: Blockeel H.
Knobbe A.J. (Arno)
Siebes A.P.J.M. (Arno)
Wallen D.M.G. van der
Publication venue: CWI
Publication date: 01/01/1999
Field of study

An important aspect of data mining algorithms and systems is that they should scale well to large databases. A consequence of this is that most data mining tools are based on machine learning algorithms that work on data in attribute-value format. Experience has proven that such 'single-table' mining algorithms indeed scale well. The downside of this format is, however, that more complex patterns are simply not expressible in this format and, thus, cannot be discovered. One way to enlarge the expressiveness is to generalize, as in ILP, from one-table mining to multiple table mining, i.e., to support mining on full relational databases. The key step in such a generalization is to ensure that the search space does not explode and that efficiency and, thus, scalability are maintained. In this paper we present a framework and an architecture that provide such a generalization. In this framework the semantic information in the database schema, e.g., foreign keys, are exploited to prune the search space and, in the architecture, database primitives are defined to ensure efficiency. Moreover, the framework induces a canonical generalization of algorithms, i.e., if the generalized algorithms are run on a single table database, they give the same results as their single-table counterparts. The framework is illustrated by the Warmr algorithm, which is a multi-relational generalization of the Apriori algorithm

CWI's Institutional Repository

Automated Construction of Relational Attributes ACORA: A Progress Report

Author: Perlich Claudia
Publication venue: Stern School of Business, New York University
Publication date: 01/08/2002
Field of study

Compact Representation of Knowledge Bases in Inductive Logic Programming

Author: Hendrik Blockeel
Jan Ramon
Jan Struyf
Maurice Bruynooghe
Sofie Verbaeten
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

MRDTL: a multi-relational decision tree learning algorithm

Author: Leiva Héctor Ariel
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2002
Field of study

Many real-world data sets are organized in relational databases consisting of multiple tables and associations. Other types of data such as in bioinformatics, computational biology, HTML and XML documents require reasoning about the structure of the objects. However, most of the existing approaches to machine learning typically assume that the data are stored in a single table, and use a propositional (as opposed to relational) language for discovering predictive models. Hence, there is a need for data mining algorithms for discovery of a-priori unknown relationships from multi-relational data. This thesis explores a new framework for multi-relational data mining. It describes experiments with an implementation of a Multi-Relational Decision Tree Learning (MRDTL) algorithm for induction of decision trees from relational databases based on an approach suggested by Knobbe et al., 1999. Our experiments with widely used benchmark data sets (e.g., the carcinogenesis data) show that the performance of MRDTL is competitive with that of other algorithms for learning classifiers from multiple relations including Progol (Muggleton, 1995) FOIL (Quinlan, 1993), Tilde (Blockeel, 1998). Preliminary results indicate that MRDTL, when augmented with principled methods for handling missing attribute values, is likely to be competitive with the state-of-the-art algorithms for learning classifiers from multiple relations on real-world data sets drawn from bioinformatics applications (prediction of gene localization and gene function) used in the KDD Cup 2001 data mining competition (Cheng et al., 2002)

Digital Repository @ Iowa State University (ISU)

Erwerb funktionaler, räumlicher und kausaler Beziehungen von Fahrzeugteilen aus einer technischen Dokumentation

Author: Siebert Mark
Publication venue: Universität Dortmund
Publication date
Field of study

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Inducción de conocimiento con incertidumbre en bases de datos relacionales borrosas

Author: Gómez Flechoso Antonio José
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/1998
Field of study

Este trabajo presenta un sistema para aprendizaje de definiciones lógicas con incertidumbre, a partir de una base de datos relacional borrosa. El campo de interés se centra, por tanto, en la programación lógica inductiva, introduciendo algunas interesantes aportaciones, principalmente en lo que se refiere a la entrada de datos y a los resultados producidos: Los datos de entrada pertenecen a una base de datos relacional borrosa. Por tanto, vienen expresados en forma de tablas de tuplas (relaciones), en las que las tuplas pueden llevar asociado un grado de pertenencia a la relación correspondiente. Se trata, por tanto, de relaciones borrosas, directamente identificables con conceptos borrosos (tan comunes en la realidad vista desde un punto de vista humano), y no de relaciones ordinarias con atributos borrosos (tal y como se entiende la "borrosidad" en muchos sistemas existentes). Los datos de salida vienen expresados en forma de definiciones lógicas de una relación (ordinaria o borrosa), que consta de una cláusula de Horn o de la disyunción de varias. Estas cláusulas de Horn se construyen mediante literales, aplicados sobre variables (generalmente), y asociados a relaciones borrosas u ordinarias. Los literales borrosos pueden ser modificados, además, por el empleo de etiquetas lingüísticas. Por tanto, se combina, en estas definiciones, la lógica de predicados con la lógica borrosa, en lo que podemos denominar "lógica borrosa de predicados", lo que constituye una aportación dentro de la inducción automática de conocimiento. Además, las definiciones inducidas llevan asociado un factor de incertidumbre, como hacen otros sistemas ya existentes. El punto de partida del trabajo lo constituye un sistema de inducción de definiciones lógicas bien conocido: FOIL, creado por Quinlan en 1990, basado en la lógica de predicados. Sobre este sistema inicial se realizan, además de las extensiones para lógica borrosa ya mencionadas, otra serie de modificaciones y ampliaciones enfocadas a mejorar la inducción de conocimiento. Estas mejoras se realizan, principalmente, en su parte heurística, al definir una función de evaluación de literales, basada en medidas de interés, que permite corregir algunas deficiencias del sistema original y aumentar la calidad de las reglas inducidas. Otras modificaciones se orientan hacia la introducción de conocimiento de base, mediante relaciones definidas intensionalmente, de modo similar a otros sistemas como FOCL. Como resultado tangible de la tesis, se ha desarrollado y probado un sistema, FZFOIL, disponible públicamente bajo la licencia GNU

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Data Mining zur Unterstützung betrieblicher Entscheidungsprozesse

Author: Tillmanns Christoph
Publication venue
Publication date: 07/11/2003
Field of study

Data Mining ist als Anwendung von Algorithmen zur Ermittlung vonDatenmustern aus großen Datenbeständen bekannt. Diese Dissertationweitetdie in der Literatur zumeist rein technisch geführte Diskussion vonData-Mining-Verfahren auf deren betriebswirtschaftlicheAnwendungspotentiale aus. Sie untersucht die Unterstützungsmöglichkeitenbetrieblicher Entscheidungsprozesse durch Data-Mining-Verfahren.Zunächstwird ein formaler 'Baukasten' zur Entwicklung neuerData-Mining-Verfahreneingeführt, der die Gestaltungsmöglichkeiten von Data-Mining-Modelltypenund ?Suchverfahren sowie die Bewertung der Interessantheit vonökonomischenModellen umfasst. Aus der Betrachtung betriebswirtschaftlicherData-Mining-Anwendungen wird ein generelles Schema zur Unterstützung vonEntscheidungsprozessen per Data Mining abgeleitet. Der Modelltyp desEntscheidungsmodells wird genauer betrachtet und einData-Mining-Verfahrenzur Generierung von Entscheidungsmodellen entwickelt. Abschließend wirddasVerfahren an Testdaten evaluiert und auf eine Problemstellung zurSelektionvon Kunden für eine Direktmarketingaktion im Versicherungsmarktangewendet

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung