11,244 research outputs found
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
We participated in three of the protein-protein interaction subtasks of the
Second BioCreative Challenge: classification of abstracts relevant for
protein-protein interaction (IAS), discovery of protein pairs (IPS) and text
passages characterizing protein interaction (ISS) in full text documents. We
approached the abstract classification task with a novel, lightweight linear
model inspired by spam-detection techniques, as well as an uncertainty-based
integration scheme. We also used a Support Vector Machine and the Singular
Value Decomposition on the same features for comparison purposes. Our approach
to the full text subtasks (protein pair and passage identification) includes a
feature expansion method based on word-proximity networks. Our approach to the
abstract classification task (IAS) was among the top submissions for this task
in terms of the measures of performance used in the challenge evaluation
(accuracy, F-score and AUC). We also report on a web-tool we produced using our
approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our
approach to the full text tasks resulted in one of the highest recall rates as
well as mean reciprocal rank of correct passages. Our approach to abstract
classification shows that a simple linear model, using relatively few features,
is capable of generalizing and uncovering the conceptual nature of
protein-protein interaction from the bibliome. Since the novel approach is
based on a very lightweight linear model, it can be easily ported and applied
to similar problems. In full text problems, the expansion of word features with
word-proximity networks is shown to be useful, though the need for some
improvements is discussed
Reducing model bias in a deep learning classifier using domain adversarial neural networks in the MINERvA experiment
We present a simulation-based study using deep convolutional neural networks
(DCNNs) to identify neutrino interaction vertices in the MINERvA passive
targets region, and illustrate the application of domain adversarial neural
networks (DANNs) in this context. DANNs are designed to be trained in one
domain (simulated data) but tested in a second domain (physics data) and
utilize unlabeled data from the second domain so that during training only
features which are unable to discriminate between the domains are promoted.
MINERvA is a neutrino-nucleus scattering experiment using the NuMI beamline at
Fermilab. -dependent cross sections are an important part of the physics
program, and these measurements require vertex finding in complicated events.
To illustrate the impact of the DANN we used a modified set of simulation in
place of physics data during the training of the DANN and then used the label
of the modified simulation during the evaluation of the DANN. We find that deep
learning based methods offer significant advantages over our prior track-based
reconstruction for the task of vertex finding, and that DANNs are able to
improve the performance of deep networks by leveraging available unlabeled data
and by mitigating network performance degradation rooted in biases in the
physics models used for training.Comment: 41 page
- …