72,953 research outputs found
Recommended from our members
Semantic memory redux: an experimental test of hierarchical category representation
Four experiments investigated the classic issue in semantic memory of whether people organize categorical information in hierarchies and use inference to retrieve information from them, as proposed by Collins & Quillian (1969). Past evidence has focused on RT to confirm sentences such as “All birds are animals” or “Canaries breathe.” However, confounding variables such as familiarity and associations between the terms have led to contradictory results. Our experiments avoided such problems by teaching subjects novel materials. Experiment 1 tested an implicit hierarchical structure in the features of a set of studied objects (e.g., all brown objects were large). Experiment 2 taught subjects nested categories of artificial bugs. In Experiment 3, subjects learned a tree structure of novel category hierarchies. In all three, the results differed from the predictions of the hierarchical inference model. In Experiment 4, subjects learned a hierarchy by means of paired associates of novel category names. Here we finally found the RT signature of hierarchical inference. We conclude that it is possible to store information in a hierarchy and retrieve it via inference, but it is difficult and avoided whenever possible. The results are more consistent with feature comparison models than hierarchical models of semantic memory
On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data
With the coming data deluge from synoptic surveys, there is a growing need
for frameworks that can quickly and automatically produce calibrated
classification probabilities for newly-observed variables based on a small
number of time-series measurements. In this paper, we introduce a methodology
for variable-star classification, drawing from modern machine-learning
techniques. We describe how to homogenize the information gleaned from light
curves by selection and computation of real-numbered metrics ("feature"),
detail methods to robustly estimate periodic light-curve features, introduce
tree-ensemble methods for accurate variable star classification, and show how
to rigorously evaluate the classification results using cross validation. On a
25-class data set of 1542 well-studied variable stars, we achieve a 22.8%
overall classification error using the random forest classifier; this
represents a 24% improvement over the best previous classifier on these data.
This methodology is effective for identifying samples of specific science
classes: for pulsational variables used in Milky Way tomography we obtain a
discovery efficiency of 98.2% and for eclipsing systems we find an efficiency
of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is
superior to other machine-learned methods in terms of accuracy, speed, and
relative immunity to features with no useful class information; the RF
classifier can also be used to estimate the importance of each feature in
classification. Additionally, we present the first astronomical use of
hierarchical classification methods to incorporate a known class taxonomy in
the classifier, which further reduces the catastrophic error rate to 7.8%.
Excluding low-amplitude sources, our overall error rate improves to 14%, with a
catastrophic error rate of 3.5%.Comment: 23 pages, 9 figure
Conformal Prediction: a Unified Review of Theory and New Challenges
In this work we provide a review of basic ideas and novel developments about
Conformal Prediction -- an innovative distribution-free, non-parametric
forecasting method, based on minimal assumptions -- that is able to yield in a
very straightforward way predictions sets that are valid in a statistical sense
also in in the finite sample case. The in-depth discussion provided in the
paper covers the theoretical underpinnings of Conformal Prediction, and then
proceeds to list the more advanced developments and adaptations of the original
idea.Comment: arXiv admin note: text overlap with arXiv:0706.3188,
arXiv:1604.04173, arXiv:1709.06233, arXiv:1203.5422 by other author
Text categorization and similarity analysis: similarity measure, architecture and design
This research looks at the most appropriate similarity measure to use for a document classification problem. The goal is to find a method that is accurate in finding both semantically and version related documents. A necessary requirement is that the method is efficient in its speed and disk usage. Simhash is found to be the measure best suited to the application and it can be combined with other software to increase the accuracy. Pingar have provided an API that will extract the entities from a document and create a taxonomy displaying the relationships and this extra information can be used to accurately classify input documents. Two algorithms are designed incorporating the Pingar API and then finally an efficient comparison algorithm is introduced to cut down the comparisons required
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
- …