353 research outputs found
Label Embedding by Johnson-Lindenstrauss Matrices
We present a simple and scalable framework for extreme multiclass
classification based on Johnson-Lindenstrauss matrices (JLMs). Using the
columns of a JLM to embed the labels, a -class classification problem is
transformed into a regression problem with \cO(\log C) output dimension. We
derive an excess risk bound, revealing a tradeoff between computational
efficiency and prediction accuracy, and further show that under the Massart
noise condition, the penalty for dimension reduction vanishes. Our approach is
easily parallelizable, and experimental results demonstrate its effectiveness
and scalability in large-scale applications
A Survey on Extreme Multi-label Learning
Multi-label learning has attracted significant attention from both academic
and industry field in recent decades. Although existing multi-label learning
algorithms achieved good performance in various tasks, they implicitly assume
the size of target label space is not huge, which can be restrictive for
real-world scenarios. Moreover, it is infeasible to directly adapt them to
extremely large label space because of the compute and memory overhead.
Therefore, eXtreme Multi-label Learning (XML) is becoming an important task and
many effective approaches are proposed. To fully understand XML, we conduct a
survey study in this paper. We first clarify a formal definition for XML from
the perspective of supervised learning. Then, based on different model
architectures and challenges of the problem, we provide a thorough discussion
of the advantages and disadvantages of each category of methods. For the
benefit of conducting empirical studies, we collect abundant resources
regarding XML, including code implementations, and useful tools. Lastly, we
propose possible research directions in XML, such as new evaluation metrics,
the tail label problem, and weakly supervised XML.Comment: A preliminary versio
Learning from eXtreme Bandit Feedback
We study the problem of batch learning from bandit feedback in the setting of
extremely large action spaces. Learning from extreme bandit feedback is
ubiquitous in recommendation systems, in which billions of decisions are made
over sets consisting of millions of choices in a single day, yielding massive
observational data. In these large-scale real-world applications, supervised
learning frameworks such as eXtreme Multi-label Classification (XMC) are widely
used despite the fact that they incur significant biases due to the mismatch
between bandit feedback and supervised labels. Such biases can be mitigated by
importance sampling techniques, but these techniques suffer from impractical
variance when dealing with a large number of actions. In this paper, we
introduce a selective importance sampling estimator (sIS) that operates in a
significantly more favorable bias-variance regime. The sIS estimator is
obtained by performing importance sampling on the conditional expectation of
the reward with respect to a small subset of actions for each instance (a form
of Rao-Blackwellization). We employ this estimator in a novel algorithmic
procedure -- named Policy Optimization for eXtreme Models (POXM) -- for
learning from bandit feedback on XMC tasks. In POXM, the selected actions for
the sIS estimator are the top-p actions of the logging policy, where p is
adjusted from the data and is significantly smaller than the size of the action
space. We use a supervised-to-bandit conversion on three XMC datasets to
benchmark our POXM method against three competing methods: BanditNet, a
previously applied partial matching pruning strategy, and a supervised learning
baseline. Whereas BanditNet sometimes improves marginally over the logging
policy, our experiments show that POXM systematically and significantly
improves over all baselines
The Emerging Trends of Multi-Label Learning
Exabytes of data are generated daily by humans, leading to the growing need
for new efforts in dealing with the grand challenges for multi-label learning
brought by big data. For example, extreme multi-label classification is an
active and rapidly growing research area that deals with classification tasks
with an extremely large number of classes or labels; utilizing massive data
with limited supervision to build a multi-label classification model becomes
valuable for practical applications, etc. Besides these, there are tremendous
efforts on how to harvest the strong learning capability of deep learning to
better capture the label dependencies in multi-label learning, which is the key
for deep learning to address real-world classification tasks. However, it is
noted that there has been a lack of systemic studies that focus explicitly on
analyzing the emerging trends and new challenges of multi-label learning in the
era of big data. It is imperative to call for a comprehensive survey to fulfill
this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202
Scalable Label Distribution Learning for Multi-Label Classification
Multi-label classification (MLC) refers to the problem of tagging a given
instance with a set of relevant labels. Most existing MLC methods are based on
the assumption that the correlation of two labels in each label pair is
symmetric, which is violated in many real-world scenarios. Moreover, most
existing methods design learning processes associated with the number of
labels, which makes their computational complexity a bottleneck when scaling up
to large-scale output space. To tackle these issues, we propose a novel MLC
learning method named Scalable Label Distribution Learning (SLDL) for
multi-label classification which can describe different labels as distributions
in a latent space, where the label correlation is asymmetric and the dimension
is independent of the number of labels. Specifically, SLDL first converts
labels into continuous distributions within a low-dimensional latent space and
leverages the asymmetric metric to establish the correlation between different
labels. Then, it learns the mapping from the feature space to the latent space,
resulting in the computational complexity is no longer related to the number of
labels. Finally, SLDL leverages a nearest-neighbor-based strategy to decode the
latent representations and obtain the final predictions. Our extensive
experiments illustrate that SLDL can achieve very competitive classification
performances with little computational consumption
GUDN: A novel guide network with label reinforcement strategy for extreme multi-label text classification
In natural language processing, extreme multi-label text classification is an
emerging but essential task. The problem of extreme multi-label text
classification (XMTC) is to recall some of the most relevant labels for a text
from an extremely large label set. Large-scale pre-trained models have brought
a new trend to this problem. Though the large-scale pre-trained models have
made significant achievements on this problem, the valuable fine-tuned methods
have yet to be studied. Though label semantics have been introduced in XMTC,
the vast semantic gap between texts and labels has yet to gain enough
attention. This paper builds a new guide network (GUDN) to help fine-tune the
pre-trained model to instruct classification later. Furthermore, GUDN uses raw
label semantics combined with a helpful label reinforcement strategy to
effectively explore the latent space between texts and labels, narrowing the
semantic gap, which can further improve predicted accuracy. Experimental
results demonstrate that GUDN outperforms state-of-the-art methods on Eurlex-4k
and has competitive results on other popular datasets. In an additional
experiment, we investigated the input lengths' influence on the
Transformer-based model's accuracy. Our source code is released at
https://t.hk.uy/aFSH.Comment: 12 pages, 6 figure
- …