6 research outputs found
DSI++: Updating Transformer Memory with New Documents
Differentiable Search Indices (DSIs) encode a corpus of documents in model
parameters and use the same model to answer user queries directly. Despite the
strong performance of DSI models, deploying them in situations where the corpus
changes over time is computationally expensive because reindexing the corpus
requires re-training the model. In this work, we introduce DSI++, a continual
learning challenge for DSI to incrementally index new documents while being
able to answer queries related to both previously and newly indexed documents.
Across different model scales and document identifier representations, we show
that continual indexing of new documents leads to considerable forgetting of
previously indexed documents. We also hypothesize and verify that the model
experiences forgetting events during training, leading to unstable learning. To
mitigate these issues, we investigate two approaches. The first focuses on
modifying the training dynamics. Flatter minima implicitly alleviate
forgetting, so we optimize for flatter loss basins and show that the model
stably memorizes more documents (). Next, we introduce a generative
memory to sample pseudo-queries for documents and supplement them during
continual indexing to prevent forgetting for the retrieval task. Extensive
experiments on novel continual indexing benchmarks based on Natural Questions
(NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting
significantly. Concretely, it improves the average Hits@10 by over
competitive baselines for NQ and requires times fewer model updates
compared to re-training the DSI model for incrementally indexing five corpora
in a sequence.Comment: Accepted at EMNLP 2023 main conferenc
A Survey on Extreme Multi-label Learning
Multi-label learning has attracted significant attention from both academic
and industry field in recent decades. Although existing multi-label learning
algorithms achieved good performance in various tasks, they implicitly assume
the size of target label space is not huge, which can be restrictive for
real-world scenarios. Moreover, it is infeasible to directly adapt them to
extremely large label space because of the compute and memory overhead.
Therefore, eXtreme Multi-label Learning (XML) is becoming an important task and
many effective approaches are proposed. To fully understand XML, we conduct a
survey study in this paper. We first clarify a formal definition for XML from
the perspective of supervised learning. Then, based on different model
architectures and challenges of the problem, we provide a thorough discussion
of the advantages and disadvantages of each category of methods. For the
benefit of conducting empirical studies, we collect abundant resources
regarding XML, including code implementations, and useful tools. Lastly, we
propose possible research directions in XML, such as new evaluation metrics,
the tail label problem, and weakly supervised XML.Comment: A preliminary versio
Extreme Classification (Dagstuhl Seminar 18291)
Extreme classification is a rapidly growing research area within machine learning focusing on multi-class and multi-label problems involving an extremely large number of labels (even more than a million). Many applications of extreme classification have been found in diverse areas ranging from language modeling to document tagging in NLP, face recognition to learning universal feature representations in computer vision, gene function prediction in bioinformatics, etc. Extreme classification has also opened up a new paradigm for key industrial applications such as ranking and recommendation by reformulating them as multi-label learning tasks where each item to be ranked or recommended is treated as a separate label. Such reformulations have led to significant gains over traditional collaborative filtering and content-based recommendation techniques. Consequently, extreme classifiers have been deployed in many real-world applications in industry.
Extreme classification has raised many new research challenges beyond the pale of traditional machine learning including developing log-time and log-space algorithms, deriving theoretical bounds that scale logarithmically with the number of labels, learning from biased training data, developing performance metrics, etc. The seminar aimed at bringing together experts in machine learning, NLP, computer vision, web search and recommendation from academia and industry to make progress on these problems. We believe that this seminar has encouraged the inter-disciplinary collaborations in the area of extreme classification, started discussion on identification of thrust areas and important research problems, motivated to improve the algorithms upon the state-of-the-art, as well to work on the theoretical foundations of extreme classification