23,816 research outputs found
A systematic comparison of supervised classifiers
Pattern recognition techniques have been employed in a myriad of industrial,
medical, commercial and academic applications. To tackle such a diversity of
data, many techniques have been devised. However, despite the long tradition of
pattern recognition research, there is no technique that yields the best
classification in all scenarios. Therefore, the consideration of as many as
possible techniques presents itself as an fundamental practice in applications
aiming at high accuracy. Typical works comparing methods either emphasize the
performance of a given algorithm in validation tests or systematically compare
various algorithms, assuming that the practical use of these methods is done by
experts. In many occasions, however, researchers have to deal with their
practical classification tasks without an in-depth knowledge about the
underlying mechanisms behind parameters. Actually, the adequate choice of
classifiers and parameters alike in such practical circumstances constitutes a
long-standing problem and is the subject of the current paper. We carried out a
study on the performance of nine well-known classifiers implemented by the Weka
framework and compared the dependence of the accuracy with their configuration
parameter configurations. The analysis of performance with default parameters
revealed that the k-nearest neighbors method exceeds by a large margin the
other methods when high dimensional datasets are considered. When other
configuration of parameters were allowed, we found that it is possible to
improve the quality of SVM in more than 20% even if parameters are set
randomly. Taken together, the investigation conducted in this paper suggests
that, apart from the SVM implementation, Weka's default configuration of
parameters provides an performance close the one achieved with the optimal
configuration
Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes
G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft
Cross-task weakly supervised learning from instructional videos
In this paper we investigate learning visual models for the steps of ordinary
tasks using weak supervision via instructional narrations and an ordered list
of steps instead of strong supervision via temporal annotations. At the heart
of our approach is the observation that weakly supervised learning may be
easier if a model shares components while learning different steps: `pour egg'
should be trained jointly with other tasks involving `pour' and `egg'. We
formalize this in a component model for recognizing steps and a weakly
supervised learning framework that can learn this model under temporal
constraints from narration and the list of steps. Past data does not permit
systematic studying of sharing and so we also gather a new dataset, CrossTask,
aimed at assessing cross-task sharing. Our experiments demonstrate that sharing
across tasks improves performance, especially when done at the component level
and that our component model can parse previously unseen tasks by virtue of its
compositionality.Comment: 18 pages, 17 figures, to be published in proceedings of the CVPR,
201
Variational Autoencoders for New Physics Mining at the Large Hadron Collider
Using variational autoencoders trained on known physics processes, we develop
a one-sided threshold test to isolate previously unseen processes as outlier
events. Since the autoencoder training does not depend on any specific new
physics signature, the proposed procedure doesn't make specific assumptions on
the nature of new physics. An event selection based on this algorithm would be
complementary to classic LHC searches, typically based on model-dependent
hypothesis testing. Such an algorithm would deliver a list of anomalous events,
that the experimental collaborations could further scrutinize and even release
as a catalog, similarly to what is typically done in other scientific domains.
Event topologies repeating in this dataset could inspire new-physics model
building and new experimental searches. Running in the trigger system of the
LHC experiments, such an application could identify anomalous events that would
be otherwise lost, extending the scientific reach of the LHC.Comment: 29 pages, 12 figures, 5 table
- …