58 research outputs found

    Automated data pre-processing via meta-learning

    Get PDF
    The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version

    Data mining workflow templates for intelligent discovery assistance in RapidMiner

    Full text link
    Knowledge Discovery in Databases (KDD) has evolved during the last years and reached a mature stage offering plenty of operators to solve complex tasks. User support for building workflows, in contrast, has not increased proportionally. The large number of operators available in current KDD systems make it difficult for users to successfully analyze data. Moreover, workflows easily contain a large number of operators and parts of the workflows are applied several times, thus it is hard for users to build them manually. In addition, workflows are not checked for correctness before execution. Hence, it frequently happens that the execution of the workflow stops with an error after several hours runtime. In this paper we address these issues by introducing a knowledge-based representation of KDD workflows as a basis for cooperative-interactive planning. Moreover, we discuss workflow templates that can mix executable operators and tasks to be refined later into sub-workflows. This new representation helps users to structure and handle workflows, as it constrains the number of operators that need to be considered. We show that workflows can be grouped in templates enabling re-use and simplifying KDD worflow construction in RapidMiner

    The quest for companions to post-common envelope binaries: I. Searching a sample of stars from the CSS and SDSS

    Full text link
    As part of an ongoing collaboration between student groups at high schools and professional astronomers, we have searched for the presence of circum-binary planets in a bona-fide unbiased sample of twelve post-common envelope binaries (PCEBs) from the Catalina Sky Survey (CSS) and the Sloan Digital Sky Survey (SDSS). Although the present ephemerides are significantly more accurate than previous ones, we find no clear evidence for orbital period variations between 2005 and 2011 or during the 2011 observing season. The sparse long-term coverage still permits O-C variations with a period of years and an amplitude of tens of seconds, as found in other systems. Our observations provide the basis for future inferences about the frequency with which planet-sized or brown-dwarf companions have either formed in these evolved systems or survived the common envelope (CE) phase.Comment: accepted by A&

    eProPlan: a tool to model automatic generation of data mining workflows

    Full text link
    This paper introduces the first ontological modeling environment for planning Knowledge Discovery (KDD) workflows. We use ontological reasoning combined with AI planning techniques to automatically generate workflows for solving Data Mining (DM) problems. The KDD researchers can easily model not only their DM and preprocessing operators but also their DM tasks, that are used to guide the workflow generation

    An overview of intelligent data assistants for data analysis

    Full text link
    Today's intelligent data assistants (IDA) for data analysis are focusing on how to do effective and intelligent data analysis. However this is not a trivial task since one must take into consideration all the influencing factors: on one hand data analysis in general and on the other hand the communication and interaction with data analysts. The basic approach of building an IDA, where data analysis is (1) better as well as (2) faster in the same time, is not a very rewarding criteria and does not help in designing good IDAs. Therefore this paper tries to (a) discover constructive criteria that allow us to compare existing systems and help design better IDAs and (b) review all previous IDAs based on these criteria to find out what are the problems that IDAs should solve as well as which method works best for which problem. In conclusion we try to learn from previous experiences what features should be incorporated into a new IDA that would solve the problems of today's analysts

    Computing Probabilistic Least Common Subsumers in Description Logics

    No full text
    corecore