8,353 research outputs found
Dissimilarity-based Ensembles for Multiple Instance Learning
In multiple instance learning, objects are sets (bags) of feature vectors
(instances) rather than individual feature vectors. In this paper we address
the problem of how these bags can best be represented. Two standard approaches
are to use (dis)similarities between bags and prototype bags, or between bags
and prototype instances. The first approach results in a relatively
low-dimensional representation determined by the number of training bags, while
the second approach results in a relatively high-dimensional representation,
determined by the total number of instances in the training set. In this paper
a third, intermediate approach is proposed, which links the two approaches and
combines their strengths. Our classifier is inspired by a random subspace
ensemble, and considers subspaces of the dissimilarity space, defined by
subsets of instances, as prototypes. We provide guidelines for using such an
ensemble, and show state-of-the-art performances on a range of multiple
instance learning problems.Comment: Submitted to IEEE Transactions on Neural Networks and Learning
Systems, Special Issue on Learning in Non-(geo)metric Space
One-class classifiers based on entropic spanning graphs
One-class classifiers offer valuable tools to assess the presence of outliers
in data. In this paper, we propose a design methodology for one-class
classifiers based on entropic spanning graphs. Our approach takes into account
the possibility to process also non-numeric data by means of an embedding
procedure. The spanning graph is learned on the embedded input data and the
outcoming partition of vertices defines the classifier. The final partition is
derived by exploiting a criterion based on mutual information minimization.
Here, we compute the mutual information by using a convenient formulation
provided in terms of the -Jensen difference. Once training is
completed, in order to associate a confidence level with the classifier
decision, a graph-based fuzzy model is constructed. The fuzzification process
is based only on topological information of the vertices of the entropic
spanning graph. As such, the proposed one-class classifier is suitable also for
data characterized by complex geometric structures. We provide experiments on
well-known benchmarks containing both feature vectors and labeled graphs. In
addition, we apply the method to the protein solubility recognition problem by
considering several representations for the input samples. Experimental results
demonstrate the effectiveness and versatility of the proposed method with
respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification
Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN,
Vancouver, Canad
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems
An examination of object recognition challenge leaderboards (ILSVRC,
PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small
differences amongst themselves in terms of error rate/mAP. To better
differentiate the top performers, additional criteria are required. Moreover,
the (test) images, on which the performance scores are based, predominantly
contain fully visible objects. Therefore, `harder' test images, mimicking the
challenging conditions (e.g. occlusion) in which humans routinely recognize
objects, need to be utilized for benchmarking. To address the concerns
mentioned above, we make two contributions. First, we systematically vary the
level of local object-part content, global detail and spatial context in images
from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12.
Second, we propose an object-part based benchmarking procedure which quantifies
classifiers' robustness to a range of visibility and contextual settings. The
benchmarking procedure relies on a semantic similarity measure that naturally
addresses potential semantic granularity differences between the category
labels in training and test datasets, thus eliminating manual mapping. We use
our procedure on the PPSS-12 dataset to benchmark top-performing classifiers
trained on the ILSVRC-2012 dataset. Our results show that the proposed
benchmarking procedure enables additional differentiation among
state-of-the-art object classifiers in terms of their ability to handle missing
content and insufficient object detail. Given this capability for additional
differentiation, our approach can potentially supplement existing benchmarking
procedures used in object recognition challenge leaderboards.Comment: Extended version of our ACCV-2016 paper. Author formatting modifie
Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation
Representing patterns as labeled graphs is becoming increasingly common in
the broad field of computational intelligence. Accordingly, a wide repertoire
of pattern recognition tools, such as classifiers and knowledge discovery
procedures, are nowadays available and tested for various datasets of labeled
graphs. However, the design of effective learning procedures operating in the
space of labeled graphs is still a challenging problem, especially from the
computational complexity viewpoint. In this paper, we present a major
improvement of a general-purpose classifier for graphs, which is conceived on
an interplay between dissimilarity representation, clustering,
information-theoretic techniques, and evolutionary optimization algorithms. The
improvement focuses on a specific key subroutine devised to compress the input
data. We prove different theorems which are fundamental to the setting of the
parameters controlling such a compression operation. We demonstrate the
effectiveness of the resulting classifier by benchmarking the developed
variants on well-known datasets of labeled graphs, considering as distinct
performance indicators the classification accuracy, computing time, and
parsimony in terms of structural complexity of the synthesized classification
models. The results show state-of-the-art standards in terms of test set
accuracy and a considerable speed-up for what concerns the computing time.Comment: Revised versio
- …