45,465 research outputs found
A Kernel Classification Framework for Metric Learning
Learning a distance metric from the given training samples plays a crucial
role in many machine learning tasks, and various models and optimization
algorithms have been proposed in the past decade. In this paper, we generalize
several state-of-the-art metric learning methods, such as large margin nearest
neighbor (LMNN) and information theoretic metric learning (ITML), into a kernel
classification framework. First, doublets and triplets are constructed from the
training samples, and a family of degree-2 polynomial kernel functions are
proposed for pairs of doublets or triplets. Then, a kernel classification
framework is established, which can not only generalize many popular metric
learning methods such as LMNN and ITML, but also suggest new metric learning
methods, which can be efficiently implemented, interestingly, by using the
standard support vector machine (SVM) solvers. Two novel metric learning
methods, namely doublet-SVM and triplet-SVM, are then developed under the
proposed framework. Experimental results show that doublet-SVM and triplet-SVM
achieve competitive classification accuracies with state-of-the-art metric
learning methods such as ITML and LMNN but with significantly less training
time.Comment: 11 pages, 7 figure
The return of AdaBoost.MH: multi-class Hamming trees
Within the framework of AdaBoost.MH, we propose to train vector-valued
decision trees to optimize the multi-class edge without reducing the
multi-class problem to binary one-against-all classifications. The key
element of the method is a vector-valued decision stump, factorized into an
input-independent vector of length and label-independent scalar classifier.
At inner tree nodes, the label-dependent vector is discarded and the binary
classifier can be used for partitioning the input space into two regions. The
algorithm retains the conceptual elegance, power, and computational efficiency
of binary AdaBoost. In experiments it is on par with support vector machines
and with the best existing multi-class boosting algorithm AOSOLogitBoost, and
it is significantly better than other known implementations of AdaBoost.MH
DCSVM: Fast Multi-class Classification using Support Vector Machines
We present DCSVM, an efficient algorithm for multi-class classification using
Support Vector Machines. DCSVM is a divide and conquer algorithm which relies
on data sparsity in high dimensional space and performs a smart partitioning of
the whole training data set into disjoint subsets that are easily separable. A
single prediction performed between two partitions eliminates at once one or
more classes in one partition, leaving only a reduced number of candidate
classes for subsequent steps. The algorithm continues recursively, reducing the
number of classes at each step, until a final binary decision is made between
the last two classes left in the competition. In the best case scenario, our
algorithm makes a final decision between classes in decision
steps and in the worst case scenario DCSVM makes a final decision in
steps, which is not worse than the existent techniques
Learning Discriminative Features Via Weights-biased Softmax Loss
Loss functions play a key role in training superior deep neural networks. In
convolutional neural networks (CNNs), the popular cross entropy loss together
with softmax does not explicitly guarantee minimization of intra-class variance
or maximization of inter-class variance. In the early studies, there is no
theoretical analysis and experiments explicitly indicating how to choose the
number of units in fully connected layer. To help CNNs learn features more fast
and discriminative, there are two contributions in this paper. First, we
determine the minimum number of units in FC layer by rigorous theoretical
analysis and extensive experiment, which reduces CNNs' parameter memory and
training time. Second, we propose a negative-focused weights-biased softmax
(W-Softmax) loss to help CNNs learn more discriminative features. The proposed
W-Softmax loss not only theoretically formulates the intraclass compactness and
inter-class separability, but also can avoid overfitting by enlarging decision
margins. Moreover, the size of decision margins can be flexibly controlled by
adjusting a hyperparameter . Extensive experimental results on several
benchmark datasets show the superiority of W-Softmax in image classification
tasks
Active Multi-Kernel Domain Adaptation for Hyperspectral Image Classification
Recent years have witnessed the quick progress of the hyperspectral images
(HSI) classification. Most of existing studies either heavily rely on the
expensive label information using the supervised learning or can hardly exploit
the discriminative information borrowed from related domains. To address this
issues, in this paper we show a novel framework addressing HSI classification
based on the domain adaptation (DA) with active learning (AL). The main idea of
our method is to retrain the multi-kernel classifier by utilizing the available
labeled samples from source domain, and adding minimum number of the most
informative samples with active queries in the target domain. The proposed
method adaptively combines multiple kernels, forming a DA classifier that
minimizes the bias between the source and target domains. Further equipped with
the nested actively updating process, it sequentially expands the training set
and gradually converges to a satisfying level of classification performance. We
study this active adaptation framework with the Margin Sampling (MS) strategy
in the HSI classification task. Our experimental results on two popular HSI
datasets demonstrate its effectiveness
A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model
Entity detection and tracking (EDT) is the task of identifying textual
mentions of real-world entities in documents, extending the named entity
detection and coreference resolution task by considering mentions other than
names (pronouns, definite descriptions, etc.). Like NE tagging and coreference
resolution, most solutions to the EDT task separate out the mention detection
aspect from the coreference aspect. By doing so, these solutions are limited to
using only local features for learning. In contrast, by modeling both aspects
of the EDT task simultaneously, we are able to learn using highly complex,
non-local features. We develop a new joint EDT model and explore the utility of
many features, demonstrating their effectiveness on this task
Multithreshold Entropy Linear Classifier
Linear classifiers separate the data with a hyperplane. In this paper we
focus on the novel method of construction of multithreshold linear classifier,
which separates the data with multiple parallel hyperplanes. Proposed model is
based on the information theory concepts -- namely Renyi's quadratic entropy
and Cauchy-Schwarz divergence.
We begin with some general properties, including data scale invariance. Then
we prove that our method is a multithreshold large margin classifier, which
shows the analogy to the SVM, while in the same time works with much broader
class of hypotheses. What is also interesting, proposed method is aimed at the
maximization of the balanced quality measure (such as Matthew's Correlation
Coefficient) as opposed to very common maximization of the accuracy. This
feature comes directly from the optimization problem statement and is further
confirmed by the experiments on the UCI datasets.
It appears, that our Multithreshold Entropy Linear Classifier (MELC) obtaines
similar or higher scores than the ones given by SVM on both synthetic and real
data. We show how proposed approach can be benefitial for the cheminformatics
in the task of ligands activity prediction, where despite better classification
results, MELC gives some additional insight into the data structure (classes of
underrepresented chemical compunds)
A Fast and Robust TSVM for Pattern Classification
Twin support vector machine~(TSVM) is a powerful learning algorithm by
solving a pair of smaller SVM-type problems. However, there are still some
specific issues such as low efficiency and weak robustness when it is faced
with some real applications. In this paper, we propose a Fast and Robust
TSVM~(FR-TSVM) to deal with the above issues. In order to alleviate the effects
of noisy inputs, we propose an effective fuzzy membership function and
reformulate the TSVMs such that different input instances can make different
contributions to the learning of the separating hyperplanes. To further speed
up the training procedure, we develop an efficient coordinate descent algorithm
with shirking to solve the involved a pair of quadratic programming problems
(QPPs). Moreover, theoretical foundations of the proposed model are analyzed in
details. The experimental results on several artificial and benchmark datasets
indicate that the FR-TSVM not only obtains a fast learning speed but also shows
a robust classification performance. Code has been made available at:
https://github.com/gaobb/FR-TSVM.Comment: 14 pages, Under Revie
Adaptive Image Stream Classification via Convolutional Neural Network with Intrinsic Similarity Metrics
When performing data classification over a stream of continuously occurring
instances, a key challenge is to develop an open-world classifier that
anticipates instances from an unknown class. Studies addressing this problem,
typically called novel class detection, have considered classification methods
that reactively adapt to such changes along the stream. Importantly, they rely
on the property of cohesion and separation among instances in feature space.
Instances belonging to the same class are assumed to be closer to each other
(cohesion) than those belonging to different classes (separation).
Unfortunately, this assumption may not have large support when dealing with
high dimensional data such as images. In this paper, we address this key
challenge by proposing a semisupervised multi-task learning framework called
CSIM which aims to intrinsically search for a latent space suitable for
detecting labels of instances from both known and unknown classes.
Particularly, we utilize a convolution neural network layer that aids in the
learning of a latent feature space suitable for novel class detection. We
empirically measure the performance of CSIM over multiple realworld image
datasets and demonstrate its superiority by comparing its performance with
existing semi-supervised methods.Comment: 10 pages; KDD'18 Deep Learning Day, August 2018, London, U
Native Language Identification using Stacked Generalization
Ensemble methods using multiple classifiers have proven to be the most
successful approach for the task of Native Language Identification (NLI),
achieving the current state of the art. However, a systematic examination of
ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble
architectures such as classifier stacking have not been closely evaluated. We
present a set of experiments using three ensemble-based models, testing each
with multiple configurations and algorithms. This includes a rigorous
application of meta-classification models for NLI, achieving state-of-the-art
results on three datasets from different languages. We also present the first
use of statistical significance testing for comparing NLI systems, showing that
our results are significantly better than the previous state of the art. We
make available a collection of test set predictions to facilitate future
statistical tests
- …