16 research outputs found

    DISCOVERING INTERESTING PATTERNS FOR INVESTMENT DECISION MAKING WITH GLOWER C - A GENETIC LEARNER OVERLAID WITH ENTROPY REDUCTION

    Get PDF
    Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open-ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify and use one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment withGLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.Information Systems Working Papers Serie

    DISCOVERING INTERESTING PATTERNS FOR INVESTMENT DECISION MAKING WITH GLOWER C - A GENETIC LEARNER OVERLAID WITH ENTROPY REDUCTION

    Get PDF
    Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open-ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify and use one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment withGLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.Information Systems Working Papers Serie

    GPR: A Data Mining Tool Using Genetic Programming

    Get PDF
    This paper proposes an inductive data mining technique (named GPR) based on genetic programming. Unlike other mining systems, the particularity of our technique is its ability to discover business rules that satisfy multiple (and possibly conflicting) decision or search criteria simultaneously. We present a step-by-step method to implement GPR, and introduce a prototype that generates production rules from real life data. We also report in this article on the use of GPR in an organization that seeks to understand how its employees make decisions in a voluntary separation program. Using a personnel database of 12,787 employees with 35 descriptive variables, our technique is able to discover employees\u27 hidden decision making patterns in the form of production rules. As our approach does not require any domain specific knowledge, it can be used without any major modification in different domains

    PREDICTING INTRADAY STOCK RETURNS BY INTEGRATING MARKET DATA AND FINANCIAL NEWS REPORTS

    Get PDF
    Forecasting in the financial domain is undoubtedly a challenging undertaking in data mining. While the majority of previous studies in this field utilize historical market data to predict future stock returns, we explore whether there is benefit in augmenting the prediction model with supplementary domain knowledge obtained from financial news reports. To this end, we empirically evaluate how the integration of these data sources helps to predict intraday stocks returns. We consider several types of integration methods: variable-based as well as bundling methods. To discern whether the integration methods are sensitive to the type of forecasting algorithm, we have implemented each integration method using three different data mining algorithms. The results show several scenarios in which appending market-based data with textual news-based data helps to improve forecasting performance. The successful integration strongly depends on which forecasting algorithm and variable representation method is utilized. The findings are promising enough to warrant further studies in this direction

    A hybrid decision tree/genetic algorithm method for data mining

    Get PDF

    Expert Stock Picker: The Wisdom of (Experts in) Crowds

    Get PDF
    The phrase the wisdom of crowds suggests that good verdicts can be achieved by averaging the opinions and insights of large, diverse groups of people who possess varied types of information. Online user-generated content enables researchers to view the opinions of large numbers of users publicly. These opinions, in the form of reviews and votes, can be used to automatically generate remarkably accurate verdicts-collective estimations of future performance-about companies, products, and people on the Web to resolve very tough problems. The wealth and richness of user-generated content may enable firms and individuals to aggregate consumer-think for better business understanding. Our main contribution, here applied to user-generated stock pick votes from a widely used online financial newsletter, is a genetic algorithm approach that can be used to identify the appropriate vote weights for users based on their prior individual voting success. Our method allows us to identify and rank experts within the crowd, enabling better stock pick decisions than the S&P 500. We show that the online crowd performs better, on average, than the S&P 500 for two test time periods, 2008 and 2009, in terms of both overall returns and risk-adjusted returns, as measured by the Sharpe ratio. Furthermore, we show that giving more weight to the votes of the experts in the crowds increases the accuracy of the verdicts, yielding an even greater return in the same time periods. We test our approach by utilizing more than three years of publicly available stock pick data. We compare our method to approaches derived from both the computer science and finance literature. We believe that our approach can be generalized to other domains where user opinions are publicly available early and where those opinions can be evaluated. For example, YouTube video ratings may be used to predict downloads, or online reviewer ratings on Digg may be used to predict the success or popularity of a story

    Uncovering exceptional predictions using exploratory analysis of second stage machine learning.

    Get PDF
    Nowadays, algorithmic systems for making decisions are widely used to facilitate decisions in a variety of fields such as medicine, banking, applying for universities or network security. However, many machine learning algorithms are well-known for their complex mathematical internal workings which turn them into black boxes and makes their decision-making process usually difficult to understand even for experts. In this thesis, we try to develop a methodology to explain why a certain exceptional machine learned decision was made incorrectly by using the interpretability of the decision tree classifier. Our approach can provide insights about potential flaws in feature definition or completeness, as well as potential incorrect training data and outliers. It also promises to help find the stereotypes learned by machine learning algorithms which lead to incorrect predictions and especially, to prevent discrimination in making socially sensitive decisions, such as credit decisions as well as crime-related and policing predictions

    Discovering Interesting Patterns for Investment Decision Making with GLOWER - A Genetic Learner Overlaid With Entropy Reduction

    Get PDF
    Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule learning algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorpo..
    corecore