2,565 research outputs found
Hierarchical Label Partitioning for Large Scale Classification
International audienceExtreme classification task where the number of classes is very large has received important focus over the last decade. Usual efficient multi-class classification approaches have not been designed to deal with such large number of classes. A particular issue in the context of large scale problems concerns the computational classification complexity : best multi-class approaches have generally a linear complexity with respect to the number of classes which does not allow these approaches to scale up. Recent works have put their focus on using hierarchical classification process in order to speed-up the classification of new instances. A priori information on labels is not always available nor useful to build hierarchical models. Finding a suitable hierarchical organization of the labels is thus a crucial issue as the accuracy of the model depends highly on the label assignment through the label tree. We propose in this work a new algorithm to build iteratively a hierarchical label structure by proposing a partitioning algorithm which optimizes simultaneously the structure in terms of classification complexity and the label partitioning problem in order to achieve high classification performances. Beginning from a flat tree structure, our algorithm selects iteratively a node to expand by adding a new level of nodes between the considered node and its children. This operation increases the speed-up of the classification process. Once the node is selected, best partitioning of the classes has to be computed. We propose to consider a measure based on the maximization of the expected loss of the sub-levels in order to minimize the global error of the structure. This choice enforces hardly separable classes to be group together in same partitions at the first levels of the tree structure and it delays errors at a deep level of the structure where there is no incidence on the accuracy of other classes
Hierarchical Text Classification with Reinforced Label Assignment
While existing hierarchical text classification (HTC) methods attempt to
capture label hierarchies for model training, they either make local decisions
regarding each label or completely ignore the hierarchy information during
inference. To solve the mismatch between training and inference as well as
modeling label dependencies in a more principled way, we formulate HTC as a
Markov decision process and propose to learn a Label Assignment Policy via deep
reinforcement learning to determine where to place an object and when to stop
the assignment process. The proposed method, HiLAP, explores the hierarchy
during both training and inference time in a consistent manner and makes
inter-dependent decisions. As a general framework, HiLAP can incorporate
different neural encoders as base models for end-to-end training. Experiments
on five public datasets and four base models show that HiLAP yields an average
improvement of 33.4% in Macro-F1 over flat classifiers and outperforms
state-of-the-art HTC methods by a large margin. Data and code can be found at
https://github.com/morningmoni/HiLAP.Comment: EMNLP 201
BatchRank: A Novel Batch Mode Active Learning Framework for Hierarchical Classification
Active learning algorithms automatically identify the salient
and exemplar instances from large amounts of unlabeled
data and thus reduce human annotation effort in inducing
a classification model. More recently, Batch Mode Active
Learning (BMAL) techniques have been proposed, where a
batch of data samples is selected simultaneously from an un-
labeled set. Most active learning algorithms assume a
at
label space, that is, they consider the class labels to be in-
dependent. However, in many applications, the set of class
labels are organized in a hierarchical tree structure, with
the leaf nodes as outputs and the internal nodes as clusters
of outputs at multiple levels of granularity. In this paper,
we propose a novel BMAL algorithm (BatchRank) for hi-
erarchical classification. The sample selection is posed as
an NP-hard integer quadratic programming problem and a
convex relaxation (based on linear programming) is derived,
whose solution is further improved by an iterative truncated
power method. Finally, a deterministic bound is established
on the quality of the solution. Our empirical results on sev-
eral challenging, real-world datasets from multiple domains,
corroborate the potential of the proposed framework for real-
world hierarchical classification applications
Multi-Instance Multi-Label Learning
In this paper, we propose the MIML (Multi-Instance Multi-Label learning)
framework where an example is described by multiple instances and associated
with multiple class labels. Compared to traditional learning frameworks, the
MIML framework is more convenient and natural for representing complicated
objects which have multiple semantic meanings. To learn from MIML examples, we
propose the MimlBoost and MimlSvm algorithms based on a simple degeneration
strategy, and experiments show that solving problems involving complicated
objects with multiple semantic meanings in the MIML framework can lead to good
performance. Considering that the degeneration process may lose information, we
propose the D-MimlSvm algorithm which tackles MIML problems directly in a
regularization framework. Moreover, we show that even when we do not have
access to the real objects and thus cannot capture more information from real
objects by using the MIML representation, MIML is still useful. We propose the
InsDif and SubCod algorithms. InsDif works by transforming single-instances
into the MIML representation for learning, while SubCod works by transforming
single-label examples into the MIML representation for learning. Experiments
show that in some tasks they are able to achieve better performance than
learning the single-instances or single-label examples directly.Comment: 64 pages, 10 figures; Artificial Intelligence, 201
- …