3,202 research outputs found
Bayes-optimal Hierarchical Classification over Asymmetric Tree-Distance Loss
Hierarchical classification is supervised multi-class classification problem
over the set of class labels organized according to a hierarchy. In this
report, we study the work by Ramaswamy et. al. on hierarchical classification
over symmetric tree distance loss. We extend the consistency of hierarchical
classification algorithm over asymmetric tree distance loss. We design a
algorithm to find Bayes optimal classification for a
k-ary tree as a hierarchy. We show that under reasonable assumptions over
asymmetric loss function, the Bayes optimal classification over this asymmetric
loss can be found in . We exploit this insight and
attempt to extend the Ova-Cascade algorithm \citet{ramaswamy2015convex} for
hierarchical classification over the asymmetric loss.Comment: CS 396 Undergraduate Project Report, 17 Pages 3 Figure
Efficient Set-Valued Prediction in Multi-Class Classification
In cases of uncertainty, a multi-class classifier preferably returns a set of
candidate classes instead of predicting a single class label with little
guarantee. More precisely, the classifier should strive for an optimal balance
between the correctness (the true class is among the candidates) and the
precision (the candidates are not too many) of its prediction. We formalize
this problem within a general decision-theoretic framework that unifies most of
the existing work in this area. In this framework, uncertainty is quantified in
terms of conditional class probabilities, and the quality of a predicted set is
measured in terms of a utility function. We then address the problem of finding
the Bayes-optimal prediction, i.e., the subset of class labels with highest
expected utility. For this problem, which is computationally challenging as
there are exponentially (in the number of classes) many predictions to choose
from, we propose efficient algorithms that can be applied to a broad family of
utility functions. Our theoretical results are complemented by experimental
studies, in which we analyze the proposed algorithms in terms of predictive
accuracy and runtime efficiency
Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling
We introduce Topic Grouper as a complementary approach in the field of
probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning
of the training vocabulary in a stepwise manner such that resulting partitions
represent topics. It is governed by a simple generative model, where the
likelihood to generate the training documents via topics is optimized. The
algorithm starts with one-word topics and joins two topics at every step. It
therefore generates a solution for every desired number of topics ranging
between the size of the training vocabulary and one. The process represents an
agglomerative clustering that corresponds to a binary tree of topics. A
resulting tree may act as a containment hierarchy, typically with more general
topics towards the root of tree and more specific topics towards the leaves.
Topic Grouper is not governed by a background distribution such as the
Dirichlet and avoids hyper parameter optimizations.
We show that Topic Grouper has reasonable predictive power and also a
reasonable theoretical and practical complexity. Topic Grouper can deal well
with stop words and function words and tends to push them into their own
topics. Also, it can handle topic distributions, where some topics are more
frequent than others. We present typical examples of computed topics from
evaluation datasets, where topics appear conclusive and coherent. In this
context, the fact that each word belongs to exactly one topic is not a major
limitation; in some scenarios this can even be a genuine advantage, e.g.~a
related shopping basket analysis may aid in optimizing groupings of articles in
sales catalogs
Every Untrue Label is Untrue in its Own Way: Controlling Error Type with the Log Bilinear Loss
Deep learning has become the method of choice in many application domains of
machine learning in recent years, especially for multi-class classification
tasks. The most common loss function used in this context is the cross-entropy
loss, which reduces to the log loss in the typical case when there is a single
correct response label. While this loss is insensitive to the identity of the
assigned class in the case of misclassification, in practice it is often the
case that some errors may be more detrimental than others. Here we present the
bilinear-loss (and related log-bilinear-loss) which differentially penalizes
the different wrong assignments of the model. We thoroughly test this method
using standard models and benchmark image datasets. As one application, we show
the ability of this method to better contain error within the correct
super-class, in the hierarchically labeled CIFAR100 dataset, without affecting
the overall performance of the classifier
Developing Bayesian Information Entropy-based Techniques for Spatially Explicit Model Assessment
The aim of this paper is to explore and develop advanced spatial Bayesian
assessment methods and techniques for land use modeling. The paper provides a
comprehensive guide for assessing additional informational entropy value of
model predictions at the spatially explicit domain of knowledge, and proposes a
few alternative metrics and indicators for extracting higher-order information
dynamics from simulation tournaments. A seven-county study area in
South-Eastern Wisconsin (SEWI) has been used to simulate and assess the
accuracy of historical land use changes (1963-1990) using artificial neural
network simulations of the Land Transformation Model (LTM). The use of the
analysis and the performance of the metrics helps: (a) understand and learn how
well the model runs fits to different combinations of presence and absence of
transitions in a landscape, not simply how well the model fits our given data;
(b) derive (estimate) a theoretical accuracy that we would expect a model to
assess under the presence of incomplete information and measurement; (c)
understand the spatially explicit role and patterns of uncertainty in
simulations and model estimations, by comparing results across simulation runs;
(d) compare the significance or estimation contribution of transitional
presence and absence (change versus no change) to model performance, and the
contribution of the spatial drivers and variables to the explanatory value of
our model; and (e) compare measurements of informational uncertainty at
different scales of spatial resolution.Comment: 13 pages, 10 figures, 3 tables, 25 equations Submitted to IEEE Trans
Inf Theor
Cost-Sensitive Label Embedding for Multi-Label Classification
Label embedding (LE) is an important family of multi-label classification
algorithms that digest the label information jointly for better performance.
Different real-world applications evaluate performance by different cost
functions of interest. Current LE algorithms often aim to optimize one specific
cost function, but they can suffer from bad performance with respect to other
cost functions. In this paper, we resolve the performance issue by proposing a
novel cost-sensitive LE algorithm that takes the cost function of interest into
account. The proposed algorithm, cost-sensitive label embedding with
multidimensional scaling (CLEMS), approximates the cost information with the
distances of the embedded vectors by using the classic multidimensional scaling
approach for manifold learning. CLEMS is able to deal with both symmetric and
asymmetric cost functions, and effectively makes cost-sensitive decisions by
nearest-neighbor decoding within the embedded vectors. We derive theoretical
results that justify how CLEMS achieves the desired cost-sensitivity.
Furthermore, extensive experimental results demonstrate that CLEMS is
significantly better than a wide spectrum of existing LE algorithms and
state-of-the-art cost-sensitive algorithms across different cost functions
Taskonomy: Disentangling Task Transfer Learning
Do visual tasks have a relationship, or are they unrelated? For instance,
could having surface normals simplify estimating the depth of an image?
Intuition answers these questions positively, implying existence of a structure
among visual tasks. Knowing this structure has notable values; it is the
concept underlying transfer learning and provides a principled way for
identifying redundancies across tasks, e.g., to seamlessly reuse supervision
among related tasks or solve many tasks in one system without piling up the
complexity.
We proposes a fully computational approach for modeling the structure of
space of visual tasks. This is done via finding (first and higher-order)
transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D,
and semantic tasks in a latent space. The product is a computational taxonomic
map for task transfer learning. We study the consequences of this structure,
e.g. nontrivial emerged relationships, and exploit them to reduce the demand
for labeled data. For example, we show that the total number of labeled
datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3
(compared to training independently) while keeping the performance nearly the
same. We provide a set of tools for computing and probing this taxonomical
structure including a solver that users can employ to devise efficient
supervision policies for their use cases.Comment: CVPR 2018 (Oral). See project website and live demos at
http://taskonomy.vision
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
Graph is an important data representation which appears in a wide diversity
of real-world scenarios. Effective graph analytics provides users a deeper
understanding of what is behind the data, and thus can benefit a lot of useful
applications such as node classification, node recommendation, link prediction,
etc. However, most graph analytics methods suffer the high computation and
space cost. Graph embedding is an effective yet efficient way to solve the
graph analytics problem. It converts the graph data into a low dimensional
space in which the graph structural information and graph properties are
maximally preserved. In this survey, we conduct a comprehensive review of the
literature in graph embedding. We first introduce the formal definition of
graph embedding as well as the related concepts. After that, we propose two
taxonomies of graph embedding which correspond to what challenges exist in
different graph embedding problem settings and how the existing work address
these challenges in their solutions. Finally, we summarize the applications
that graph embedding enables and suggest four promising future research
directions in terms of computation efficiency, problem settings, techniques and
application scenarios.Comment: A 20-page comprehensive survey of graph/network embedding for over
150+ papers till year 2018. It provides systematic categorization of
problems, techniques and applications. Accepted by IEEE Transactions on
Knowledge and Data Engineering (TKDE). Comments and suggestions are welcomed
for continuously improving this surve
Hierarchical Density Order Embeddings
By representing words with probability densities rather than point vectors,
probabilistic word embeddings can capture rich and interpretable semantic
information and uncertainty. The uncertainty information can be particularly
meaningful in capturing entailment relationships -- whereby general words such
as "entity" correspond to broad distributions that encompass more specific
words such as "animal" or "instrument". We introduce density order embeddings,
which learn hierarchical representations through encapsulation of probability
densities. In particular, we propose simple yet effective loss functions and
distance metrics, as well as graph-based schemes to select negative samples to
better learn hierarchical density representations. Our approach provides
state-of-the-art performance on the WordNet hypernym relationship prediction
task and the challenging HyperLex lexical entailment dataset -- while retaining
a rich and interpretable density representation.Comment: Published at ICLR 201
Online Machine Learning in Big Data Streams
The area of online machine learning in big data streams covers algorithms
that are (1) distributed and (2) work from data streams with only a limited
possibility to store past data. The first requirement mostly concerns software
architectures and efficient algorithms. The second one also imposes nontrivial
theoretical restrictions on the modeling methods: In the data stream model,
older data is no longer available to revise earlier suboptimal modeling
decisions as the fresh data arrives.
In this article, we provide an overview of distributed software architectures
and libraries as well as machine learning models for online learning. We
highlight the most important ideas for classification, regression,
recommendation, and unsupervised modeling from streaming data, and we show how
they are implemented in various distributed data stream processing systems.
This article is a reference material and not a survey. We do not attempt to
be comprehensive in describing all existing methods and solutions; rather, we
give pointers to the most important resources in the field. All related
sub-fields, online algorithms, online learning, and distributed data processing
are hugely dominant in current research and development with conceptually new
research results and software components emerging at the time of writing. In
this article, we refer to several survey results, both for distributed data
processing and for online machine learning. Compared to past surveys, our
article is different because we discuss recommender systems in extended detail
- …