23,816 research outputs found

    A systematic comparison of supervised classifiers

    Get PDF
    Pattern recognition techniques have been employed in a myriad of industrial, medical, commercial and academic applications. To tackle such a diversity of data, many techniques have been devised. However, despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, the consideration of as many as possible techniques presents itself as an fundamental practice in applications aiming at high accuracy. Typical works comparing methods either emphasize the performance of a given algorithm in validation tests or systematically compare various algorithms, assuming that the practical use of these methods is done by experts. In many occasions, however, researchers have to deal with their practical classification tasks without an in-depth knowledge about the underlying mechanisms behind parameters. Actually, the adequate choice of classifiers and parameters alike in such practical circumstances constitutes a long-standing problem and is the subject of the current paper. We carried out a study on the performance of nine well-known classifiers implemented by the Weka framework and compared the dependence of the accuracy with their configuration parameter configurations. The analysis of performance with default parameters revealed that the k-nearest neighbors method exceeds by a large margin the other methods when high dimensional datasets are considered. When other configuration of parameters were allowed, we found that it is possible to improve the quality of SVM in more than 20% even if parameters are set randomly. Taken together, the investigation conducted in this paper suggests that, apart from the SVM implementation, Weka's default configuration of parameters provides an performance close the one achieved with the optimal configuration

    Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes

    Get PDF
    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft

    Cross-task weakly supervised learning from instructional videos

    Get PDF
    In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: `pour egg' should be trained jointly with other tasks involving `pour' and `egg'. We formalize this in a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Past data does not permit systematic studying of sharing and so we also gather a new dataset, CrossTask, aimed at assessing cross-task sharing. Our experiments demonstrate that sharing across tasks improves performance, especially when done at the component level and that our component model can parse previously unseen tasks by virtue of its compositionality.Comment: 18 pages, 17 figures, to be published in proceedings of the CVPR, 201

    Variational Autoencoders for New Physics Mining at the Large Hadron Collider

    Get PDF
    Using variational autoencoders trained on known physics processes, we develop a one-sided threshold test to isolate previously unseen processes as outlier events. Since the autoencoder training does not depend on any specific new physics signature, the proposed procedure doesn't make specific assumptions on the nature of new physics. An event selection based on this algorithm would be complementary to classic LHC searches, typically based on model-dependent hypothesis testing. Such an algorithm would deliver a list of anomalous events, that the experimental collaborations could further scrutinize and even release as a catalog, similarly to what is typically done in other scientific domains. Event topologies repeating in this dataset could inspire new-physics model building and new experimental searches. Running in the trigger system of the LHC experiments, such an application could identify anomalous events that would be otherwise lost, extending the scientific reach of the LHC.Comment: 29 pages, 12 figures, 5 table
    corecore