12,315 research outputs found
Novel support vector machines for diverse learning paradigms
This dissertation introduces novel support vector machines (SVM) for the following traditional and non-traditional learning paradigms: Online classification, Multi-Target Regression, Multiple-Instance classification, and Data Stream classification.
Three multi-target support vector regression (SVR) models are first presented. The first involves building independent, single-target SVR models for each target. The second builds an ensemble of randomly chained models using the first single-target method as a base model. The third calculates the targets\u27 correlations and forms a maximum correlation chain, which is used to build a single chained SVR model, improving the model\u27s prediction performance, while reducing computational complexity.
Under the multi-instance paradigm, a novel SVM multiple-instance formulation and an algorithm with a bag-representative selector, named Multi-Instance Representative SVM (MIRSVM), are presented. The contribution trains the SVM based on bag-level information and is able to identify instances that highly impact classification, i.e. bag-representatives, for both positive and negative bags, while finding the optimal class separation hyperplane. Unlike other multi-instance SVM methods, this approach eliminates possible class imbalance issues by allowing both positive and negative bags to have at most one representative, which constitute as the most contributing instances to the model.
Due to the shortcomings of current popular SVM solvers, especially in the context of large-scale learning, the third contribution presents a novel stochastic, i.e. online, learning algorithm for solving the L1-SVM problem in the primal domain, dubbed OnLine Learning Algorithm using Worst-Violators (OLLAWV). This algorithm, unlike other stochastic methods, provides a novel stopping criteria and eliminates the need for using a regularization term. It instead uses early stopping. Because of these characteristics, OLLAWV was proven to efficiently produce sparse models, while maintaining a competitive accuracy.
OLLAWV\u27s online nature and success for traditional classification inspired its implementation, as well as its predecessor named OnLine Learning Algorithm - List 2 (OLLA-L2), under the batch data stream classification setting. Unlike other existing methods, these two algorithms were chosen because their properties are a natural remedy for the time and memory constraints that arise from the data stream problem. OLLA-L2\u27s low spacial complexity deals with memory constraints imposed by the data stream setting, and OLLAWV\u27s fast run time, early self-stopping capability, as well as the ability to produce sparse models, agrees with both memory and time constraints. The preliminary results for OLLAWV showed a superior performance to its predecessor and was chosen to be used in the final set of experiments against current popular data stream methods.
Rigorous experimental studies and statistical analyses over various metrics and datasets were conducted in order to comprehensively compare the proposed solutions against modern, widely-used methods from all paradigms. The experimental studies and analyses confirm that the proposals achieve better performances and more scalable solutions than the methods compared, making them competitive in their respected fields
Multiple Instance Hybrid Estimator for Learning Target Signatures
Signature-based detectors for hyperspectral target detection rely on knowing
the specific target signature in advance. However, target signature are often
difficult or impossible to obtain. Furthermore, common methods for obtaining
target signatures, such as from laboratory measurements or manual selection
from an image scene, usually do not capture the discriminative features of
target class. In this paper, an approach for estimating a discriminative target
signature from imprecise labels is presented. The proposed approach maximizes
the response of the hybrid sub-pixel detector within a multiple instance
learning framework and estimates a set of discriminative target signatures.
After learning target signatures, any signature based detector can then be
applied on test data. Both simulated and real hyperspectral target detection
experiments are shown to illustrate the effectiveness of the method
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Object Tracking with Multiple Instance Learning and Gaussian Mixture Model
Recently, Multiple Instance Learning (MIL) technique has been introduced for object tracking\linebreak applications, which has shown its good performance to handle drifting problem. While some instances in positive bags not only contain objects, but also contain the background, it is not reliable to simply assume that each feature of instances in positive bags obeys a single Gaussian distribution. In this paper, a tracker based on online multiple instance boosting has been developed, which employs Gaussian Mixture Model (GMM) and single Gaussian distribution respectively to model features of instances in positive and negative bags. The differences between samples and the model are integrated into the process of updating the parameters for GMM. With the Haar-like features extracted from the bags, a set of weak classifiers are trained to construct a strong classifier, which is used to track the object location at a new frame. And the classifier can be updated online frame by frame. Experimental results have shown that our tracker is more stable and efficient when dealing with the illumination, rotation, pose and appearance changes
A Feature Selection Method for Multivariate Performance Measures
Feature selection with specific multivariate performance measures is the key
to the success of many applications, such as image retrieval and text
classification. The existing feature selection methods are usually designed for
classification error. In this paper, we propose a generalized sparse
regularizer. Based on the proposed regularizer, we present a unified feature
selection framework for general loss functions. In particular, we study the
novel feature selection paradigm by optimizing multivariate performance
measures. The resultant formulation is a challenging problem for
high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed
to solve this problem, and the convergence is presented. In addition, we adapt
the proposed method to optimize multivariate measures for multiple instance
learning problems. The analyses by comparing with the state-of-the-art feature
selection methods show that the proposed method is superior to others.
Extensive experiments on large-scale and high-dimensional real world datasets
show that the proposed method outperforms -SVM and SVM-RFE when choosing a
small subset of features, and achieves significantly improved performances over
SVM in terms of -score
Multiple instance learning for sequence data with across bag dependencies
In Multiple Instance Learning (MIL) problem for sequence data, the instances
inside the bags are sequences. In some real world applications such as
bioinformatics, comparing a random couple of sequences makes no sense. In fact,
each instance may have structural and/or functional relations with instances of
other bags. Thus, the classification task should take into account this across
bag relation. In this work, we present two novel MIL approaches for sequence
data classification named ABClass and ABSim. ABClass extracts motifs from
related instances and use them to encode sequences. A discriminative classifier
is then applied to compute a partial classification result for each set of
related sequences. ABSim uses a similarity measure to discriminate the related
instances and to compute a scores matrix. For both approaches, an aggregation
method is applied in order to generate the final classification result. We
applied both approaches to solve the problem of bacterial Ionizing Radiation
Resistance prediction. The experimental results of the presented approaches are
satisfactory
- …