535 research outputs found

    Discovering Regression Rules with Ant Colony Optimization

    Get PDF
    The majority of Ant Colony Optimization (ACO) algorithms for data mining have dealt with classification or clustering problems. Regression remains an unexplored research area to the best of our knowledge. This paper proposes a new ACO algorithm that generates regression rules for data mining applications. The new algorithm combines components from an existing deterministic (greedy) separate and conquer algorithm—employing the same quality metrics and continuous attribute processing techniques—allowing a comparison of the two. The new algorithm has been shown to decrease the relative root mean square error when compared to the greedy algorithm. Additionally a different approach to handling continuous attributes was investigated showing further improvements were possible

    Condition monitoring of an advanced gas-cooled nuclear reactor core

    Get PDF
    A critical component of an advanced gas-cooled reactor station is the graphite core. As a station ages, the graphite bricks that comprise the core can distort and may eventually crack. Since the core cannot be replaced, the core integrity ultimately determines the station life. Monitoring these distortions is usually restricted to the routine outages, which occur every few years, as this is the only time that the reactor core can be accessed by external sensing equipment. This paper presents a monitoring module based on model-based techniques using measurements obtained during the refuelling process. A fault detection and isolation filter based on unknown input observer techniques is developed. The role of this filter is to estimate the friction force produced by the interaction between the wall of the fuel channel and the fuel assembly supporting brushes. This allows an estimate to be made of the shape of the graphite bricks that comprise the core and, therefore, to monitor any distortion on them

    Using an Ant Colony Optimization Algorithm for Monotonic Regression Rule Discovery

    Get PDF
    Many data mining algorithms do not make use of existing domain knowledge when constructing their models. This can lead to model rejection as users may not trust models that behave contrary to their expectations. Semantic constraints provide a way to encapsulate this knowledge which can then be used to guide the construction of models. One of the most studied semantic constraints in the literature is monotonicity, however current monotonically-aware algorithms have focused on ordinal classification problems. This paper proposes an extension to an ACO-based regression algorithm in order to extract a list of monotonic regression rules. We compared the proposed algorithm against a greedy regression rule induction algorithm that preserves monotonic constraints and the well-known M5’ Rules. Our experiments using eight publicly available data sets show that the proposed algorithm successfully creates monotonic rules while maintaining predictive accuracy

    Learning Interestingness of Streaming Classification Rules

    Get PDF
    Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules. © Springer-Verlag 2004

    Detecting Large Concept Extensions for Conceptual Analysis

    Full text link
    When performing a conceptual analysis of a concept, philosophers are interested in all forms of expression of a concept in a text---be it direct or indirect, explicit or implicit. In this paper, we experiment with topic-based methods of automating the detection of concept expressions in order to facilitate philosophical conceptual analysis. We propose six methods based on LDA, and evaluate them on a new corpus of court decision that we had annotated by experts and non-experts. Our results indicate that these methods can yield important improvements over the keyword heuristic, which is often used as a concept detection heuristic in many contexts. While more work remains to be done, this indicates that detecting concepts through topics can serve as a general-purpose method for at least some forms of concept expression that are not captured using naive keyword approaches

    A Mixed-Attribute Approach in Ant-Miner Classification Rule Discovery Algorithm

    Get PDF
    In this paper, we introduce Ant-MinerMA to tackle mixed-attribute classification problems. Most classification problems involve continuous, ordinal and categorical attributes. The majority of Ant Colony Optimization (ACO) classification algorithms have the limitation of being able to handle categorical attributes only, with few exceptions that use a discretisation procedure when handling continuous attributes either in a preprocessing stage or during the rule creation. Using a solution archive as a pheromone model, inspired by the ACO for mixed-variable optimization (ACO-MV), we eliminate the need for a discretisation procedure and attributes can be treated directly as continuous, ordinal, or categorical. We compared the proposed Ant-MinerMA against cAnt-Miner, an ACO-based classification algorithm that uses a discretisation procedure in the rule construction process. Our results show that Ant-MinerMA achieved significant improvements on computational time due to the elimination of the discretisation procedure without affecting the predictive performance

    Data mining: a tool for detecting cyclical disturbances in supply networks.

    Get PDF
    Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause

    An intelligent assistant for exploratory data analysis

    Get PDF
    In this paper we present an account of the main features of SNOUT, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships ina data set rather than simply develop predictive models for selectd variables. Brief descriptions of a number of novel techniques developed for use in SNOUT are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction and model selection using a genetic algorithm
    • …
    corecore