Search CORE

58 research outputs found

Automated data pre-processing via meta-learning

Author: A Guazzelli
A Kalousis
D Pyle
F Serban
J Vanschoren
J-U Kietz
M Hall
MA Munson
SF Crone
T Dasu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Data mining workflow templates for intelligent discovery assistance in RapidMiner

Author: Bernstein A
Fischer S
Kietz J U
Serban F
Publication venue
Publication date: 16/09/2010
Field of study

Knowledge Discovery in Databases (KDD) has evolved during the last years and reached a mature stage offering plenty of operators to solve complex tasks. User support for building workflows, in contrast, has not increased proportionally. The large number of operators available in current KDD systems make it difficult for users to successfully analyze data. Moreover, workflows easily contain a large number of operators and parts of the workflows are applied several times, thus it is hard for users to build them manually. In addition, workflows are not checked for correctness before execution. Hence, it frequently happens that the execution of the workflow stops with an error after several hours runtime. In this paper we address these issues by introducing a knowledge-based representation of KDD workflows as a basis for cooperative-interactive planning. Moreover, we discuss workflow templates that can mix executable operators and tasks to be refined later into sub-workflows. This new representation helps users to structure and handle workflows, as it constrains the number of operators that need to be considered. We show that workflows can be grouped in templates enabling re-use and simplifying KDD worflow construction in RapidMiner

ZORA

AutoSPARQL: Let Users Query Your Knowledge Base

Author: C.H. Bryant
C.M. Cumby
J. Lehmann
J. Lehmann
J. Lehmann
J. Pérez
J.-U. Kietz
L. Iannone
N. Fanizzi
P. Cimiano
R. Angles
S. Muggleton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

The quest for companions to post-common envelope binaries: I. Searching a sample of stars from the CSS and SDSS

As part of an ongoing collaboration between student groups at high schools and professional astronomers, we have searched for the presence of circum-binary planets in a bona-fide unbiased sample of twelve post-common envelope binaries (PCEBs) from the Catalina Sky Survey (CSS) and the Sloan Digital Sky Survey (SDSS). Although the present ephemerides are significantly more accurate than previous ones, we find no clear evidence for orbital period variations between 2005 and 2011 or during the 2011 observing season. The sparse long-term coverage still permits O-C variations with a period of years and an amplitude of tens of seconds, as found in other systems. Our observations provide the basis for future inferences about the frequency with which planet-sized or brown-dwarf companions have either formed in these evolved systems or survived the common envelope (CE) phase.Comment: accepted by A&

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Balanced cooperative modeling

Author: E.Y. Shapiro
J.-U. Kietz
K. Morik
K. Morik
K. Morik
Katharina Morik
R. Quinlan
R. Quinlan
S. Thieme
S. Wrobel
S. Wrobel
S. Wrobel
Y. Kodratoff
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1993
Field of study

Crossref

Distribution-based aggregation for relational learning with identifier attributes

Author: A. Bradley
A. McCallum
C. Cortes
C. Perlich
Claudia Perlich
Foster Provost
G. Özsoyoǵlu
H. Blockeel
J. Quinlan
J.-U. Kietz
M. Craven
N. Lavrač
R. DerSimonian
R. Michalski
S. Muggleton
T. Fawcett
T. Gärtner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Learning Constraint Satisfaction Problems: An ILP Perspective

Author: C Bessiere
D Angluin
D Haussler
J-U Kietz
K Leo
L Raedt De
L Raedt De
L Raedt De
L Raedt De
L Raedt De
LG Valiant
N Beldiceanu
N Beldiceanu
N Nethercote
N-F Zhou
R Coletta
S Abdennadher
TM Mitchell
W Buntine
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

eProPlan: a tool to model automatic generation of data mining workflows

Author: Bernstein A
Kietz J U
Serban F
Publication venue
Publication date: 20/08/2010
Field of study

This paper introduces the first ontological modeling environment for planning Knowledge Discovery (KDD) workflows. We use ontological reasoning combined with AI planning techniques to automatically generate workflows for solving Data Mining (DM) problems. The KDD researchers can easily model not only their DM and preprocessing operators but also their DM tasks, that are used to guide the workflow generation

ZORA

An overview of intelligent data assistants for data analysis

Author: Bernstein A
Kietz J U
Serban F
Publication venue
Publication date: 20/08/2010
Field of study

Today's intelligent data assistants (IDA) for data analysis are focusing on how to do effective and intelligent data analysis. However this is not a trivial task since one must take into consideration all the influencing factors: on one hand data analysis in general and on the other hand the communication and interaction with data analysts. The basic approach of building an IDA, where data analysis is (1) better as well as (2) faster in the same time, is not a very rewarding criteria and does not help in designing good IDAs. Therefore this paper tries to (a) discover constructive criteria that allow us to compare existing systems and help design better IDAs and (b) review all previous IDAs based on these criteria to find out what are the problems that IDAs should solve as well as which method works best for which problem. In conclusion we try to learn from previous experiences what features should be incorporated into a new IDA that would solve the problems of today's analysts

ZORA

Computing Probabilistic Least Common Subsumers in Description Logics

Author: J. U. Kietz
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref