16,077 research outputs found

    Using attribute construction to improve the predictability of a GP financial forecasting algorithm

    Get PDF
    Financial forecasting is an important area in computational finance. EDDIE 8 is an established Genetic Programming financial forecasting algorithm, which has successfully been applied to a number of international datasets. The purpose of this paper is to further increase the algorithm’s predictive performance, by improving its data space representation. In order to achieve this, we use attribute construction to create new (high-level) attributes from the original (low-level) attributes. To examine the effectiveness of the above method, we test the extended EDDIE’s predictive performance across 25 datasets and compare it to the performance of two previous EDDIE algorithms. Results show that the introduction of attribute construction benefits the algorithm, allowing EDDIE to explore the use of new attributes to improve its predictive accuracy

    Interpretable Categorization of Heterogeneous Time Series Data

    Get PDF
    Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision trees with a grammar framework. Logical expressions derived from a context-free grammar are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. In particular, when a grammar based on temporal logic is used, we show that GBDTs can be used for the interpretable classi cation of high-dimensional and heterogeneous time series data. Furthermore, we show how GBDTs can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply GBDTs to analyze the classic Australian Sign Language dataset as well as data on near mid-air collisions (NMACs). The NMAC data comes from aircraft simulations used in the development of the next-generation Airborne Collision Avoidance System (ACAS X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data Mining (SDM) 201

    A Comparative Study on the Use of Classification Algorithms in Financial Forecasting

    Get PDF
    Financial forecasting is a vital area in computational finance, where several studies have taken place over the years. One way of viewing financial forecasting is as a classification problem, where the goal is to find a model that represents the predictive relationships between predictor attribute values and class attribute values. In this paper we present a comparative study between two bio-inspired classification algorithms, a genetic programming algorithm especially designed for financial forecasting, and an ant colony optimization one, which is designed for classification problems. In addition, we compare the above algorithms with two other state-of-the-art classification algorithms, namely C4.5 and RIPPER. Results show that the ant colony optimization classification algorithm is very successful, significantly outperforming all other algorithms in the given classification problems, which provides insights for improving the design of specific financial forecasting algorithms

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Decision Making in the Medical Domain: Comparing the Effectiveness of GP-Generated Fuzzy Intelligent Structures

    Get PDF
    ABSTRACT: In this work, we examine the effectiveness of two intelligent models in medical domains. Namely, we apply grammar-guided genetic programming to produce fuzzy intelligent structures, such as fuzzy rule-based systems and fuzzy Petri nets, in medical data mining tasks. First, we use two context-free grammars to describe fuzzy rule-based systems and fuzzy Petri nets with genetic programming. Then, we apply cellular encoding in order to express the fuzzy Petri nets with arbitrary size and topology. The models are examined thoroughly in four real-world medical data sets. Results are presented in detail and the competitive advantages and drawbacks of the selected methodologies are discussed, in respect to the nature of each application domain. Conclusions are drawn on the effectiveness and efficiency of the presented approach
    corecore