11,244 research outputs found

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

    Get PDF
    We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (IAS), discovery of protein pairs (IPS) and text passages characterizing protein interaction (ISS) in full text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam-detection techniques, as well as an uncertainty-based integration scheme. We also used a Support Vector Machine and the Singular Value Decomposition on the same features for comparison purposes. Our approach to the full text subtasks (protein pair and passage identification) includes a feature expansion method based on word-proximity networks. Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of the measures of performance used in the challenge evaluation (accuracy, F-score and AUC). We also report on a web-tool we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Our approach to abstract classification shows that a simple linear model, using relatively few features, is capable of generalizing and uncovering the conceptual nature of protein-protein interaction from the bibliome. Since the novel approach is based on a very lightweight linear model, it can be easily ported and applied to similar problems. In full text problems, the expansion of word features with word-proximity networks is shown to be useful, though the need for some improvements is discussed

    Reducing model bias in a deep learning classifier using domain adversarial neural networks in the MINERvA experiment

    Full text link
    We present a simulation-based study using deep convolutional neural networks (DCNNs) to identify neutrino interaction vertices in the MINERvA passive targets region, and illustrate the application of domain adversarial neural networks (DANNs) in this context. DANNs are designed to be trained in one domain (simulated data) but tested in a second domain (physics data) and utilize unlabeled data from the second domain so that during training only features which are unable to discriminate between the domains are promoted. MINERvA is a neutrino-nucleus scattering experiment using the NuMI beamline at Fermilab. AA-dependent cross sections are an important part of the physics program, and these measurements require vertex finding in complicated events. To illustrate the impact of the DANN we used a modified set of simulation in place of physics data during the training of the DANN and then used the label of the modified simulation during the evaluation of the DANN. We find that deep learning based methods offer significant advantages over our prior track-based reconstruction for the task of vertex finding, and that DANNs are able to improve the performance of deep networks by leveraging available unlabeled data and by mitigating network performance degradation rooted in biases in the physics models used for training.Comment: 41 page
    • …
    corecore