32,152 research outputs found

    Identifying Mislabeled Training Data

    Full text link
    This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30 percent. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data

    Complexity, BioComplexity, the Connectionist Conjecture and Ontology of Complexity\ud

    Get PDF
    This paper develops and integrates major ideas and concepts on complexity and biocomplexity - the connectionist conjecture, universal ontology of complexity, irreducible complexity of totality & inherent randomness, perpetual evolution of information, emergence of criticality and equivalence of symmetry & complexity. This paper introduces the Connectionist Conjecture which states that the one and only representation of Totality is the connectionist one i.e. in terms of nodes and edges. This paper also introduces an idea of Universal Ontology of Complexity and develops concepts in that direction. The paper also develops ideas and concepts on the perpetual evolution of information, irreducibility and computability of totality, all in the context of the Connectionist Conjecture. The paper indicates that the control and communication are the prime functionals that are responsible for the symmetry and complexity of complex phenomenon. The paper takes the stand that the phenomenon of life (including its evolution) is probably the nearest to what we can describe with the term “complexity”. The paper also assumes that signaling and communication within the living world and of the living world with the environment creates the connectionist structure of the biocomplexity. With life and its evolution as the substrate, the paper develops ideas towards the ontology of complexity. The paper introduces new complexity theoretic interpretations of fundamental biomolecular parameters. The paper also develops ideas on the methodology to determine the complexity of “true” complex phenomena.\u

    A rigorous comparison of different planet detection algorithms

    Get PDF
    The idea of finding extrasolar planets (ESPs) through observations of drops in stellar brightness due to transiting objects has been around for decades. It has only been in the last ten years, however, that any serious attempts to find ESPs became practical. The discovery of a transiting planet around the star HD 209458 (Charbonneau et al. 2000) has led to a veritable explosion of research, because the photometric method is the only way to search a large number of stars for ESPs simultaneously with current technology. To this point, however, there has been limited research into the various techniques used to extract the subtle transit signals from noise, mainly brief summaries in various papers focused on publishing transit-like signatures in observations. The scheduled launches over the next few years of satellites whose primary or secondary science missions will be ESP discovery motivates a review and a comparative study of the various algorithms used to perform the transit identification, to determine rigorously and fairly which one is the most sensitive under which circumstances, to maximize the results of past, current, and future observational campaigns.Comment: Accepted for publications by Astronomy and Astrophysic

    A geometric interpretation of the permutation pp-value and its application in eQTL studies

    Get PDF
    Permutation pp-values have been widely used to assess the significance of linkage or association in genetic studies. However, the application in large-scale studies is hindered by a heavy computational burden. We propose a geometric interpretation of permutation pp-values, and based on this geometric interpretation, we develop an efficient permutation pp-value estimation method in the context of regression with binary predictors. An application to a study of gene expression quantitative trait loci (eQTL) shows that our method provides reliable estimates of permutation pp-values while requiring less than 5% of the computational time compared with direct permutations. In fact, our method takes a constant time to estimate permutation pp-values, no matter how small the pp-value. Our method enables a study of the relationship between nominal pp-values and permutation pp-values in a wide range, and provides a geometric perspective on the effective number of independent tests.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS298 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Inference in classifier systems

    Get PDF
    Classifier systems (Css) provide a rich framework for learning and induction, and they have beenı successfully applied in the artificial intelligence literature for some time. In this paper, both theı architecture and the inferential mechanisms in general CSs are reviewed, and a number of limitations and extensions of the basic approach are summarized. A system based on the CS approach that is capable of quantitative data analysis is outlined and some of its peculiarities discussed

    Physicists, stamp collectors, human mobility forecasters

    Get PDF
    One of the two reviewers studied in high school to be a physicist. In the end, he became something else, but he never lost his awe of physics. The other reviewer never intended to become a physicist, but he sometimes asks himself why he didn’t become one. Today, they are both sociologists who practice their science on an action theory basis and believe that regularities exist in the world of social actions which can be perceived, understood, explained – and even used for making predictions

    Four PPPPerspectives on Computational Creativity

    Get PDF
    From what perspective should creativity of a system be considered? Are we interested in the creativity of the system’s out- put? The creativity of the system itself? Or of its creative processes? Creativity as measured by internal features or by external feedback? Traditionally within computational creativity the focus had been on the creativity of the system’s Products or of its Processes, though this focus has widened recently regarding the role of the audience or the field surrounding the creative system. In the wider creativity research community a broader take is prevalent: the creative Person is considered as well as the environment or Press within which the creative entity operates in. Here we have the Four Ps of creativity: Person, Product, Process and Press. This paper presents the Four Ps, explaining each of the Four Ps in the context of creativity research and how it relates to computational creativity. To illustrate how useful the Four Ps can be in taking a fuller perspective on creativity, the concepts of novelty and value explored from each of the Four P perspectives, uncovering aspects that may otherwise be overlooked. This paper argues that the broader view of creativity afforded by the Four Ps is vital in guiding us towards more encompassing and comprehensive computational investigations of creativity
    • 

    corecore