4,703 research outputs found
Bayesian Network Based Label Correlation Analysis For Multi-label Classifier Chain
Classifier chain (CC) is a multi-label learning approach that constructs a
sequence of binary classifiers according to a label order. Each classifier in
the sequence is responsible for predicting the relevance of one label. When
training the classifier for a label, proceeding labels will be taken as
extended features. If the extended features are highly correlated to the label,
the performance will be improved, otherwise, the performance will not be
influenced or even degraded. How to discover label correlation and determine
the label order is critical for CC approach. This paper employs Bayesian
network (BN) to model the label correlations and proposes a new BN-based CC
method (BNCC). First, conditional entropy is used to describe the dependency
relations among labels. Then, a BN is built up by taking nodes as labels and
weights of edges as their dependency relations. A new scoring function is
proposed to evaluate a BN structure, and a heuristic algorithm is introduced to
optimize the BN. At last, by applying topological sorting on the nodes of the
optimized BN, the label order for constructing CC model is derived.
Experimental comparisons demonstrate the feasibility and effectiveness of the
proposed method.Comment: 27 page
A Mixtures-of-Experts Framework for Multi-Label Classification
We develop a novel probabilistic approach for multi-label classification that
is based on the mixtures-of-experts architecture combined with recently
introduced conditional tree-structured Bayesian networks. Our approach captures
different input-output relations from multi-label data using the efficient
tree-structured classifiers, while the mixtures-of-experts architecture aims to
compensate for the tree-structured restrictions and build a more accurate
model. We develop and present algorithms for learning the model from data and
for performing multi-label predictions on future data instances. Experiments on
multiple benchmark datasets demonstrate that our approach achieves highly
competitive results and outperforms the existing state-of-the-art multi-label
classification methods
Efficient Monte Carlo Methods for Multi-Dimensional Learning with Classifier Chains
Multi-dimensional classification (MDC) is the supervised learning problem
where an instance is associated with multiple classes, rather than with a
single class, as in traditional classification problems. Since these classes
are often strongly correlated, modeling the dependencies between them allows
MDC methods to improve their performance - at the expense of an increased
computational cost. In this paper we focus on the classifier chains (CC)
approach for modeling dependencies, one of the most popular and highest-
performing methods for multi-label classification (MLC), a particular case of
MDC which involves only binary classes (i.e., labels). The original CC
algorithm makes a greedy approximation, and is fast but tends to propagate
errors along the chain. Here we present novel Monte Carlo schemes, both for
finding a good chain sequence and performing efficient inference. Our
algorithms remain tractable for high-dimensional data sets and obtain the best
predictive performance across several real data sets.Comment: Submitted to Pattern Recognitio
Ensemble Methods for Multi-label Classification
Ensemble methods have been shown to be an effective tool for solving
multi-label classification tasks. In the RAndom k-labELsets (RAKEL) algorithm,
each member of the ensemble is associated with a small randomly-selected subset
of k labels. Then, a single label classifier is trained according to each
combination of elements in the subset. In this paper we adopt a similar
approach, however, instead of randomly choosing subsets, we select the minimum
required subsets of k labels that cover all labels and meet additional
constraints such as coverage of inter-label correlations. Construction of the
cover is achieved by formulating the subset selection as a minimum set covering
problem (SCP) and solving it by using approximation algorithms. Every cover
needs only to be prepared once by offline algorithms. Once prepared, a cover
may be applied to the classification of any given multi-label dataset whose
properties conform with those of the cover. The contribution of this paper is
two-fold. First, we introduce SCP as a general framework for constructing label
covers while allowing the user to incorporate cover construction constraints.
We demonstrate the effectiveness of this framework by proposing two
construction constraints whose enforcement produces covers that improve the
prediction performance of random selection. Second, we provide theoretical
bounds that quantify the probabilities of random selection to produce covers
that meet the proposed construction criteria. The experimental results indicate
that the proposed methods improve multi-label classification accuracy and
stability compared with the RAKEL algorithm and to other state-of-the-art
algorithms
Simpler is better: a novel genetic algorithm to induce compact multi-label chain classifiers
Multi-label classification (MLC) is the task of assigning multiple class labels to an object based on the features that describe the object. One of the most effective MLC methods is known as Classifier Chains (CC). This approach consists in training q binary classifiers linked in a chain, y1 → y2 → ... → yq, with each responsible for classifying a specific label in {l1, l2, ..., lq}. The chaining mechanism allows each individual classifier to incorporate the predictions of the previous ones as additional information at classification time. Thus, possible correlations among labels can be automatically exploited. Nevertheless, CC suffers from two important drawbacks: (i) the label ordering is decided at random, although it usually has a strong effect on predictive accuracy; (ii) all labels are inserted into the chain, although some of them might carry irrelevant information to discriminate the others. In this paper we tackle both problems at once, by proposing a novel genetic algorithm capable of searching for a single optimized label ordering, while at the same time taking into consideration the utilization of partial chains. Experiments on benchmark datasets demonstrate that our approach is able to produce models that are both simpler and more accurate
Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests
It is becoming increasingly important for machine learning methods to make
predictions that are interpretable as well as accurate. In many practical
applications, it is of interest which features and feature interactions are
relevant to the prediction task. We present a novel method, Selective Bayesian
Forest Classifier, that strikes a balance between predictive power and
interpretability by simultaneously performing classification, feature
selection, feature interaction detection and visualization. It builds
parsimonious yet flexible models using tree-structured Bayesian networks, and
samples an ensemble of such models using Markov chain Monte Carlo. We build in
feature selection by dividing the trees into two groups according to their
relevance to the outcome of interest. Our method performs competitively on
classification and feature selection benchmarks in low and high dimensions, and
includes a visualization tool that provides insight into relevant features and
interactions.Comment: R package: github.com/vkrakovna/sbf
SGD on Neural Networks Learns Functions of Increasing Complexity
We perform an experimental study of the dynamics of Stochastic Gradient
Descent (SGD) in learning deep neural networks for several real and synthetic
classification tasks. We show that in the initial epochs, almost all of the
performance improvement of the classifier obtained by SGD can be explained by a
linear classifier. More generally, we give evidence for the hypothesis that, as
iterations progress, SGD learns functions of increasing complexity. This
hypothesis can be helpful in explaining why SGD-learned classifiers tend to
generalize well even in the over-parameterized regime. We also show that the
linear classifier learned in the initial stages is "retained" throughout the
execution even if training is continued to the point of zero training error,
and complement this with a theoretical result in a simplified model. Key to our
work is a new measure of how well one classifier explains the performance of
another, based on conditional mutual information.Comment: Submitted to NeurIPS 201
MCODE: Multivariate Conditional Outlier Detection
Outlier detection aims to identify unusual data instances that deviate from
expected patterns. The outlier detection is particularly challenging when
outliers are context dependent and when they are defined by unusual
combinations of multiple outcome variable values. In this paper, we develop and
study a new conditional outlier detection approach for multivariate outcome
spaces that works by (1) transforming the conditional detection to the outlier
detection problem in a new (unconditional) space and (2) defining outlier
scores by analyzing the data in the new space. Our approach relies on the
classifier chain decomposition of the multi-dimensional classification problem
that lets us transform the output space into a probability vector, one
probability for each dimension of the output space. Outlier scores applied to
these transformed vectors are then used to detect the outliers. Experiments on
multiple multi-dimensional classification problems with the different outlier
injection rates show that our methodology is robust and able to successfully
identify outliers when outliers are either sparse (manifested in one or very
few dimensions) or dense (affecting multiple dimensions)
Generative Models of Visually Grounded Imagination
It is easy for people to imagine what a man with pink hair looks like, even
if they have never seen such a person before. We call the ability to create
images of novel semantic concepts visually grounded imagination. In this paper,
we show how we can modify variational auto-encoders to perform this task. Our
method uses a novel training objective, and a novel product-of-experts
inference network, which can handle partially specified (abstract) concepts in
a principled and efficient way. We also propose a set of easy-to-compute
evaluation metrics that capture our intuitive notions of what it means to have
good visual imagination, namely correctness, coverage, and compositionality
(the 3 C's). Finally, we perform a detailed comparison of our method with two
existing joint image-attribute VAE methods (the JMVAE method of Suzuki et.al.
and the BiVCCA method of Wang et.al.) by applying them to two datasets: the
MNIST-with-attributes dataset (which we introduce here), and the CelebA
dataset.Comment: International Conference on Learning Representations (ICLR), 201
Supervised cross-modal factor analysis for multiple modal data classification
In this paper we study the problem of learning from multiple modal data for
purpose of document classification. In this problem, each document is composed
two different modals of data, i.e., an image and a text. Cross-modal factor
analysis (CFA) has been proposed to project the two different modals of data to
a shared data space, so that the classification of a image or a text can be
performed directly in this space. A disadvantage of CFA is that it has ignored
the supervision information. In this paper, we improve CFA by incorporating the
supervision information to represent and classify both image and text modals of
documents. We project both image and text data to a shared data space by factor
analysis, and then train a class label predictor in the shared space to use the
class label information. The factor analysis parameter and the predictor
parameter are learned jointly by solving one single objective function. With
this objective function, we minimize the distance between the projections of
image and text of the same document, and the classification error of the
projection measured by hinge loss function. The objective function is optimized
by an alternate optimization strategy in an iterative algorithm. Experiments in
two different multiple modal document data sets show the advantage of the
proposed algorithm over other CFA methods
- …