32,152 research outputs found
Identifying Mislabeled Training Data
This paper presents a new approach to identifying and eliminating mislabeled
training instances for supervised learning. The goal of this approach is to
improve classification accuracies produced by learning algorithms by improving
the quality of the training data. Our approach uses a set of learning
algorithms to create classifiers that serve as noise filters for the training
data. We evaluate single algorithm, majority vote and consensus filters on five
datasets that are prone to labeling errors. Our experiments illustrate that
filtering significantly improves classification accuracy for noise levels up to
30 percent. An analytical and empirical evaluation of the precision of our
approach shows that consensus filters are conservative at throwing away good
data at the expense of retaining bad data and that majority filters are better
at detecting bad data at the expense of throwing away good data. This suggests
that for situations in which there is a paucity of data, consensus filters are
preferable, whereas majority vote filters are preferable for situations with an
abundance of data
Complexity, BioComplexity, the Connectionist Conjecture and Ontology of Complexity\ud
This paper develops and integrates major ideas and concepts on complexity and biocomplexity - the connectionist conjecture, universal ontology of complexity, irreducible complexity of totality & inherent randomness, perpetual evolution of information, emergence of criticality and equivalence of symmetry & complexity. This paper introduces the Connectionist Conjecture which states that the one and only representation of Totality is the connectionist one i.e. in terms of nodes and edges. This paper also introduces an idea of Universal Ontology of Complexity and develops concepts in that direction. The paper also develops ideas and concepts on the perpetual evolution of information, irreducibility and computability of totality, all in the context of the Connectionist Conjecture. The paper indicates that the control and communication are the prime functionals that are responsible for the symmetry and complexity of complex phenomenon. The paper takes the stand that the phenomenon of life (including its evolution) is probably the nearest to what we can describe with the term âcomplexityâ. The paper also assumes that signaling and communication within the living world and of the living world with the environment creates the connectionist structure of the biocomplexity. With life and its evolution as the substrate, the paper develops ideas towards the ontology of complexity. The paper introduces new complexity theoretic interpretations of fundamental biomolecular parameters. The paper also develops ideas on the methodology to determine the complexity of âtrueâ complex phenomena.\u
A rigorous comparison of different planet detection algorithms
The idea of finding extrasolar planets (ESPs) through observations of drops
in stellar brightness due to transiting objects has been around for decades. It
has only been in the last ten years, however, that any serious attempts to find
ESPs became practical. The discovery of a transiting planet around the star HD
209458 (Charbonneau et al. 2000) has led to a veritable explosion of research,
because the photometric method is the only way to search a large number of
stars for ESPs simultaneously with current technology. To this point, however,
there has been limited research into the various techniques used to extract the
subtle transit signals from noise, mainly brief summaries in various papers
focused on publishing transit-like signatures in observations. The scheduled
launches over the next few years of satellites whose primary or secondary
science missions will be ESP discovery motivates a review and a comparative
study of the various algorithms used to perform the transit identification, to
determine rigorously and fairly which one is the most sensitive under which
circumstances, to maximize the results of past, current, and future
observational campaigns.Comment: Accepted for publications by Astronomy and Astrophysic
A geometric interpretation of the permutation -value and its application in eQTL studies
Permutation -values have been widely used to assess the significance of
linkage or association in genetic studies. However, the application in
large-scale studies is hindered by a heavy computational burden. We propose a
geometric interpretation of permutation -values, and based on this geometric
interpretation, we develop an efficient permutation -value estimation method
in the context of regression with binary predictors. An application to a study
of gene expression quantitative trait loci (eQTL) shows that our method
provides reliable estimates of permutation -values while requiring less than
5% of the computational time compared with direct permutations. In fact, our
method takes a constant time to estimate permutation -values, no matter how
small the -value. Our method enables a study of the relationship between
nominal -values and permutation -values in a wide range, and provides a
geometric perspective on the effective number of independent tests.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS298 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Inference in classifier systems
Classifier systems (Css) provide a rich framework for learning and induction, and they have beenı successfully applied in the artificial intelligence literature for some time. In this paper, both theı architecture and the inferential mechanisms in general CSs are reviewed, and a number of limitations and extensions of the basic approach are summarized. A system based on the CS approach that is capable of quantitative data analysis is outlined and some of its peculiarities discussed
Physicists, stamp collectors, human mobility forecasters
One of the two reviewers studied in high school to be a physicist. In the end, he became something else, but he never lost his awe of physics. The other reviewer never intended to become a physicist, but he sometimes asks himself why he didnât become one. Today, they are both sociologists who practice their science on an action theory basis and believe that regularities exist in the
world of social actions which can be perceived, understood, explained â and even used for making predictions
Four PPPPerspectives on Computational Creativity
From what perspective should creativity of a system be considered? Are we interested in the creativity of the systemâs out- put? The creativity of the system itself? Or of its creative processes? Creativity as measured by internal features or by external feedback? Traditionally within computational creativity the focus had been on the creativity of the systemâs Products or of its Processes, though this focus has widened recently regarding the role of the audience or the field surrounding the creative system. In the wider creativity research community a broader take is prevalent: the creative Person is considered as well as the environment or Press within which the creative entity operates in. Here we have the Four Ps of creativity: Person, Product, Process and Press. This paper presents the Four Ps, explaining each of the Four Ps in the context of creativity research and how it relates to computational creativity. To illustrate how useful the Four Ps can be in taking a fuller perspective on creativity, the concepts of novelty and value explored from each of the Four P perspectives, uncovering aspects that may otherwise be overlooked. This paper argues that the broader view of creativity afforded by the Four Ps is vital in guiding us towards more encompassing and comprehensive computational investigations of creativity
- âŠ