15,638 research outputs found
A no-regret generalization of hierarchical softmax to extreme multi-label classification
Extreme multi-label classification (XMLC) is a problem of tagging an instance
with a small subset of relevant labels chosen from an extremely large pool of
possible labels. Large label spaces can be efficiently handled by organizing
labels as a tree, like in the hierarchical softmax (HSM) approach commonly used
for multi-class problems. In this paper, we investigate probabilistic label
trees (PLTs) that have been recently devised for tackling XMLC problems. We
show that PLTs are a no-regret multi-label generalization of HSM when
precision@k is used as a model evaluation metric. Critically, we prove that
pick-one-label heuristic - a reduction technique from multi-label to
multi-class that is routinely used along with HSM - is not consistent in
general. We also show that our implementation of PLTs, referred to as
extremeText (XT), obtains significantly better results than HSM with the
pick-one-label heuristic and XML-CNN, a deep network specifically designed for
XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches
in terms of statistical performance, model size and prediction time which makes
it amenable to deploy in an online system.Comment: Accepted at NIPS 201
Streaming Label Learning for Modeling Labels on the Fly
It is challenging to handle a large volume of labels in multi-label learning.
However, existing approaches explicitly or implicitly assume that all the
labels in the learning process are given, which could be easily violated in
changing environments. In this paper, we define and study streaming label
learning (SLL), i.e., labels are arrived on the fly, to model newly arrived
labels with the help of the knowledge learned from past labels. The core of SLL
is to explore and exploit the relationships between new labels and past labels
and then inherit the relationship into hypotheses of labels to boost the
performance of new classifiers. In specific, we use the label
self-representation to model the label relationship, and SLL will be divided
into two steps: a regression problem and a empirical risk minimization (ERM)
problem. Both problems are simple and can be efficiently solved. We further
show that SLL can generate a tighter generalization error bound for new labels
than the general ERM framework with trace norm or Frobenius norm
regularization. Finally, we implement extensive experiments on various
benchmark datasets to validate the new setting. And results show that SLL can
effectively handle the constantly emerging new labels and provides excellent
classification performance
A Hierarchical Spectral Method for Extreme Classification
Extreme classification problems are multiclass and multilabel classification
problems where the number of outputs is so large that straightforward
strategies are neither statistically nor computationally viable. One strategy
for dealing with the computational burden is via a tree decomposition of the
output space. While this typically leads to training and inference that scales
sublinearly with the number of outputs, it also results in reduced statistical
performance. In this work, we identify two shortcomings of tree decomposition
methods, and describe two heuristic mitigations. We compose these with an
eigenvalue technique for constructing the tree. The end result is a
computationally efficient algorithm that provides good statistical performance
on several extreme data sets.Comment: Reference implementation available at
https://github.com/pmineiro/xls
Hybrid Generative/Discriminative Learning for Automatic Image Annotation
Automatic image annotation (AIA) raises tremendous challenges to machine
learning as it requires modeling of data that are both ambiguous in input and
output, e.g., images containing multiple objects and labeled with multiple
semantic tags. Even more challenging is that the number of candidate tags is
usually huge (as large as the vocabulary size) yet each image is only related
to a few of them. This paper presents a hybrid generative-discriminative
classifier to simultaneously address the extreme data-ambiguity and
overfitting-vulnerability issues in tasks such as AIA. Particularly: (1) an
Exponential-Multinomial Mixture (EMM) model is established to capture both the
input and output ambiguity and in the meanwhile to encourage prediction
sparsity; and (2) the prediction ability of the EMM model is explicitly
maximized through discriminative learning that integrates variational inference
of graphical models and the pairwise formulation of ordinal regression.
Experiments show that our approach achieves both superior annotation
performance and better tag scalability.Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty
in Artificial Intelligence (UAI2010
Accelerating Extreme Classification via Adaptive Feature Agglomeration
Extreme classification seeks to assign each data point, the most relevant
labels from a universe of a million or more labels. This task is faced with the
dual challenge of high precision and scalability, with millisecond level
prediction times being a benchmark. We propose DEFRAG, an adaptive feature
agglomeration technique to accelerate extreme classification algorithms.
Despite past works on feature clustering and selection, DEFRAG distinguishes
itself in being able to scale to millions of features, and is especially
beneficial when feature sets are sparse, which is typical of recommendation and
multi-label datasets. The method comes with provable performance guarantees and
performs efficient task-driven agglomeration to reduce feature dimensionalities
by an order of magnitude or more. Experiments show that DEFRAG can not only
reduce training and prediction times of several leading extreme classification
algorithms by as much as 40%, but also be used for feature reconstruction to
address the problem of missing features, as well as offer superior coverage on
rare labels.Comment: A version of this paper without the appendices will appear at the
28th International Joint Conference on Artificial Intelligence (IJCAI 2019).
Code for this paper is available at https://github.com/purushottamkar/defrag
The Extreme Value Machine
It is often desirable to be able to recognize when inputs to a recognition
function learned in a supervised manner correspond to classes unseen at
training time. With this ability, new class labels could be assigned to these
inputs by a human operator, allowing them to be incorporated into the
recognition function --- ideally under an efficient incremental update
mechanism. While good algorithms that assume inputs from a fixed set of classes
exist, e.g., artificial neural networks and kernel machines, it is not
immediately obvious how to extend them to perform incremental learning in the
presence of unknown query classes. Existing algorithms take little to no
distributional information into account when learning recognition functions and
lack a strong theoretical foundation. We address this gap by formulating a
novel, theoretically sound classifier --- the Extreme Value Machine (EVM). The
EVM has a well-grounded interpretation derived from statistical Extreme Value
Theory (EVT), and is the first classifier to be able to perform nonlinear
kernel-free variable bandwidth incremental learning. Compared to other
classifiers in the same deep network derived feature space, the EVM is accurate
and efficient on an established benchmark partition of the ImageNet dataset.Comment: Pre-print of a manuscript accepted to the IEEE Transactions on
Pattern Analysis and Machine Intelligence (T-PAMI) journa
Cortex Neural Network: learning with Neural Network groups
Neural Network has been successfully applied to many real-world problems,
such as image recognition and machine translation. However, for the current
architecture of neural networks, it is hard to perform complex cognitive tasks,
for example, to process the image and audio inputs together. Cortex, as an
important architecture in the brain, is important for animals to perform the
complex cognitive task. We view the architecture of Cortex in the brain as a
missing part in the design of the current artificial neural network. In this
paper, we purpose Cortex Neural Network (CrtxNN). The Cortex Neural Network is
an upper architecture of neural networks which motivated from cerebral cortex
in the brain to handle different tasks in the same learning system. It is able
to identify different tasks and solve them with different methods. In our
implementation, the Cortex Neural Network is able to process different
cognitive tasks and perform reflection to get a higher accuracy. We provide a
series of experiments to examine the capability of the cortex architecture on
traditional neural networks. Our experiments proved its ability on the Cortex
Neural Network can reach accuracy by 98.32% on MNIST and 62% on CIFAR10 at the
same time, which can promisingly reduce the loss by 40%
How to Learn a Model Checker
We show how machine-learning techniques, particularly neural networks, offer
a very effective and highly efficient solution to the approximate
model-checking problem for continuous and hybrid systems, a solution where the
general-purpose model checker is replaced by a model-specific classifier
trained by sampling model trajectories. To the best of our knowledge, we are
the first to establish this link from machine learning to model checking. Our
method comprises a pipeline of analysis techniques for estimating and obtaining
statistical guarantees on the classifier's prediction performance, as well as
tuning techniques to improve such performance. Our experimental evaluation
considers the time-bounded reachability problem for three well-established
benchmarks in the hybrid systems community. On these examples, we achieve an
accuracy of 99.82% to 100% and a false-negative rate (incorrectly predicting
that unsafe states are not reachable from a given state) of 0.0007 to 0. We
believe that this level of accuracy is acceptable in many practical
applications and we show how the approximate model checker can be made more
conservative by tuning the classifier through further training and selection of
the classification threshold.Comment: 16 pages, 13 figures, short version submitted to HSCC201
Predicting Outcome of Indian Premier League (IPL) Matches Using Machine Learning
Cricket, especially the Twenty20 format, has maximum uncertainty, where a
single over can completely change the momentum of the game. With millions of
people following the Indian Premier League (IPL), developing a model for
predicting the outcome of its matches is a real-world problem. A cricket match
depends upon various factors, and in this work, the factors which significantly
influence the outcome of a Twenty20 cricket match are identified. Each player's
performance in the field is considered to find out the overall weight (relative
strength) of the teams. A multivariate regression based solution is proposed to
calculate points for each player in the league and the overall weight of a team
is computed based on the past performance of the players who have appeared most
for the team. Finally, a dataset is modeled based on the identified seven
factors which influence the outcome of an IPL match. Six machine learning
models were trained and used for predicting the outcome of each 2018 IPL match,
15 minutes before the gameplay, immediately after the toss. Three of the
trained models were seen to be correctly predicting more than 40 matches, with
Multilayer Perceptron outperforming all other models with an impressive
accuracy of 71.66%
Security Matters: A Survey on Adversarial Machine Learning
Adversarial machine learning is a fast growing research area, which considers
the scenarios when machine learning systems may face potential adversarial
attackers, who intentionally synthesize input data to make a well-trained model
to make mistake. It always involves a defending side, usually a classifier, and
an attacking side that aims to cause incorrect output. The earliest studies on
the adversarial examples for machine learning algorithms start from the
information security area, which considers a much wider varieties of attacking
methods. But recent research focus that popularized by the deep learning
community places strong emphasis on how the "imperceivable" perturbations on
the normal inputs may cause dramatic mistakes by the deep learning with
supposed super-human accuracy. This paper serves to give a comprehensive
introduction to a range of aspects of the adversarial deep learning topic,
including its foundations, typical attacking and defending strategies, and some
extended studies
- …