6 research outputs found

    DISCOVERING INTERESTING PATTERNS FOR INVESTMENT DECISION MAKING WITH GLOWER C - A GENETIC LEARNER OVERLAID WITH ENTROPY REDUCTION

    Get PDF
    Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open-ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify and use one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment withGLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.Information Systems Working Papers Serie

    ART: A Hybrid Classification Model

    Full text link

    DISCOVERING INTERESTING PATTERNS FOR INVESTMENT DECISION MAKING WITH GLOWER C - A GENETIC LEARNER OVERLAID WITH ENTROPY REDUCTION

    Get PDF
    Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open-ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify and use one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment withGLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.Information Systems Working Papers Serie

    Evaluación del desarrollo de biofilms en los sistemas de distribución de agua potable mediante la extracción de conocimiento a través de los datos (Knowledge Discovery in Databases)

    Full text link
    [ES] Uno de los principales objetivos de las empresas encargadas de la gestión de los sistemas de distribución de agua potable (DWDSs, del inglés Drinking Water Distribution Systems) es asegurar una alta calidad microbiológica en su abastecimiento. Sin embargo, la existencia de biofilms en todos ellos, a pesar de la presencia de desinfectante residual, hace que no se pueda asegurar un control bacteriológico total; por lo que, hoy en día, los biofilms representan un paradigma en la gestión de la calidad del agua en los DWDSs. Los biofilms son comunidades complejas de microrganismos recubiertas de un polímero extracelular que les da estructura y les ayuda a retener el alimento y a protegerse de agentes tóxicos. Además del riesgo sanitario que suponen por su papel como refugio de patógenos, existen muchos otros problemas asociados al desarrollo de biofilms en los DWDSs, como deterioro estético del agua, biocorrosión y consumo de desinfectante, entre otros. Son varias las investigaciones que se han llevado a cabo en este área. Sin embargo, los estudios realizados en relación a la influencia conjunta de las distintas características de los DWDSs en el desarrollo de biofilms, excepto notables excepciones, son escasos, debido a la complejidad de la comunidad y del entorno estudiado. El presente trabajo trata de cubrir esta carencia, estudiando el efecto de la interacción del conjunto de características físicas e hidráulicas de los DWDSs relevantes en el desarrollo de biofilms. Para ello utilizamos la metodología de extracción de conocimiento a través de los datos (KDD, del inglés Knowledge Discovery in Databases). Además, introducimos técnicas de ensamblaje adecuadas que nos permiten aumentar la robustez y precisión de los resultados obtenidos y así mejorar la metodología final propuesta de ayuda a la toma de decisiones. La realización de este trabajo ha servido para confirmar la necesidad de estudiar el impacto que el conjunto de las características de los DWDSs tienen en el desarrollo de biofilms. Mostramos que el efecto que una variable tiene sobre este desarrollo depende del valor que tomen el resto de variables y así identificamos condiciones conjuntas, físicas e hidráulicas, que determinan el mayor o menor desarrollo de biofilms en el interior de las tuberías.[EN] One of the main challenges of drinking water utilities is to ensure microbial high quality supply. However, biofilms invariably develop in all drinking water distribution systems (DWDSs), despite the presence of residual disinfectant. As a result, water utilities are not able to ensure a total bacteriological control. Currently biofilms represent a real paradigm in water quality management for all DWDSs. Biofilms are complex communities of microorganisms bound by an extracellular polymer that provides them with structure, protection from toxics and helps retain food. Besides the health risk that biofilms involve, due to their role as a pathogen shelter, a number of additional problems associated with biofilm development in DWDSs can be identified. Among others, aesthetic deterioration of water, biocorrosion and disinfectant decay are universally recognized. Numerous investigations have been carried out in this field. Nevertheless, the joint influence of the various DWDS characteristics in biofilm development, apart from a few exceptions, has been scarcely studied, due to the complexity of the community and the environment under study. The present work aims to help solve this problem studying the effect of the interaction among relevant hydraulic and physical characteristics of the DWDSs in biofilm development. To achieve this purpose we have chosen the framework of the KDD (Knowledge Discovery in Databases). Ensamble methods have been introduced to increase the robustness and the precision of the obtained results. The final aim is to improve the proposed methodology to assist in decision making. This work confirms the necessity of studying the impact that the joint characteristics of the DWDSs has in biofilm development. We show that the effect of one variable depends on the values of the rest of variables and, as a result, we are able to identify some joint physical and hydraulic scenarios that determine greater or lesser biofilm development in pipe walls.Ramos Martínez, E. (2012). Evaluación del desarrollo de biofilms en los sistemas de distribución de agua potable mediante la extracción de conocimiento a través de los datos (Knowledge Discovery in Databases). http://hdl.handle.net/10251/19124Archivo delegad

    Flexible information management strategies in machine learning and data mining

    Get PDF
    In recent times, a number of data rnining and machine learning techniques have been applied successfully to discover useful knowledge from data. Of the available techniques, rule induction and data clustering are two of the most useful and popular. Knowledge discovered from rule induction techniques in the form of If-Then rules is easy for users to understand and verify, and can be employed as classification or prediction models. Data clustering techniques are used to explore irregularities in the data distribution. Although rule induction and data clustering techniques are applied successfully in several applications, assumptions and constraints in their approaches have limited their capabilities. The main aim of this work is to develop flexible management strategies for these techniques to improve their performance. The first part of the thesis introduces a new covering algorithm, called Rule Extraction System with Adaptivity, which forms the whole rule set simultaneously instead of a single rule at a time. The rule set in the proposed algorithm is managed flexibly during the learning phase. Rules can be added to or omitted from the rule set depending on knowledge at the time. In addition, facilities to process continuous attributes directly and to prune the rule set automatically are implemented in the Rule Extraction System with Adaptivity algorithm The second part introduces improvements to the K-means algorithm in data clustering. Flexible management of clusters is applied during the learning process to help the algorithm to find the optimal solution. Another flexible management strategy is used to facilitate the processing of very large data sets. Finally, an effective method to determine the most suitable number of clusters for the K-means algorithm is proposed. The method has overcome all deficiencies of K-means
    corecore