Search CORE

66 research outputs found

Towards Memory-Efficient Training for Extremely Large Output Spaces -- Learning with 500k Labels on a Single Commodity GPU

Author: Babbar Rohit
Schultheis Erik
Publication venue
Publication date: 06/06/2023
Field of study

In classification problems with large output spaces (up to millions of labels), the last layer can require an enormous amount of memory. Using sparse connectivity would drastically reduce the memory requirements, but as we show below, it can result in much diminished predictive performance of the model. Fortunately, we found that this can be mitigated by introducing a penultimate layer of intermediate size. We further demonstrate that one can constrain the connectivity of the sparse layer to be uniform, in the sense that each output neuron will have the exact same number of incoming connections. This allows for efficient implementations of sparse matrix multiplication and connection redistribution on GPU hardware. Via a custom CUDA implementation, we show that the proposed approach can scale to datasets with 670,000 labels on a single commodity GPU with only 4GB memory

arXiv.org e-Print Archive

CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification

Author: Babbar Rohit
Banerjee Atmadeep
Kharbanda Siddhant
Schultheis Erik
Publication venue
Publication date: 29/10/2022
Field of study

Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent approaches, such as XR-Transformer and LightXML, leverage a transformer instance to achieve state-of-the-art performance. However, in this process, these approaches need to make various trade-offs between performance and computational requirements. A major shortcoming, as compared to the Bi-LSTM based AttentionXML, is that they fail to keep separate feature representations for each resolution in a label tree. We thus propose CascadeXML, an end-to-end multi-resolution learning pipeline, which can harness the multi-layered architecture of a transformer model for attending to different label resolutions with separate feature representations. CascadeXML significantly outperforms all existing approaches with non-trivial gains obtained on benchmark datasets consisting of up to three million labels. Code for CascadeXML will be made publicly available at \url{https://github.com/xmc-aalto/cascadexml}

arXiv.org e-Print Archive

Maximum-Margin Framework for Training Data Synchronization in Large-Scale Hierarchical Classification

Author: Amini Massih-Reza
Babbar Rohit
Gaussier Eric
Partalas Ioannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceIn the context of supervised learning, the training data for large-scale hierarchical classification consist of (i) a set of input-output pairs, and (ii) a hierarchy structure defining parent-child relation among class labels. It is often the case that the hierarchy structure given a-priori is not optimal for achieving high classification accuracy. This is especially true for web-taxonomies such as Yahoo! directory which consist of tens of thousand of classes. Furthermore, an important goal of hierarchy design is to render better navigability and browsing. In this work, we propose a maximum-margin framework for automatically adapting the given hierarchy by using the set of input-output pairs to yield a new hierarchy. The proposed method is not only theoretically justified but also provides a more principled approach for hierarchy flattening techniques proposed earlier, which are ad-hoc and empirical in nature. The empirical results on publicly available large-scale datasets demonstrate that classification with new hierarchy leads to better or comparable generalization performance than the hierarchy flattening techniques

Crossref

Hal - Université Grenoble Alpes

Learning Taxonomy Adaptation in Large-scale Classification

Author: Amblard Cécile
Amini Massih-Reza
Babbar Rohit
Eric Gaussier
Partalas Ioannis
Publication venue: Microtome Publishing
Publication date: 01/05/2016
Field of study

International audienc

Hal - Université Grenoble Alpes

Detecting Sequential Genre Change in Eighteenth-Century Texts

Author: Babbar Rohit
Ginter Filip
Rastas Iiro
Ryan Yann Ciarán
Tolonen Mikko
Zhang Jinbin
Publication venue: CEUR-WS.org
Publication date: 12/12/2022
Field of study

Machine classification of historical books into genres is a common task for NLP-based classifiers and has a number of applications, from literary analysis to information retrieval. However it is not a straightforward task, as genre labels can be ambiguous and subject to temporal change, and moreoever many books consist of mixed or miscellaneous genres. In this paper we describe a work-in-progress method by which genre predictions can be used to determine longer sequences of genre change within books, which we test out with visualisations of some hand-picked texts. We apply state-of-the-art methods to the task, including a BERT-based transformer and character-level Perceiver model, both pre-trained on a large collection of eighteenth century works (ECCO), using a new set of hand-annotated documents created to reflect historical divisions. Results show that both models perform significantly better than a linear baseline, particularly when ECCO-BERT is combined with tfidf features, though for this task the character-level model provides no obvious advantage. Initial evaluation of the genre sequence method shows it may in the future be useful in determining and dividing the multiple genres of miscellaneous and hybrid historical texts.Peer reviewe

OPUS

Helsingin yliopiston digitaalinen arkisto

Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model

Author: Babbar Rohit
Ginter Filip
Mäkelä Eetu
Qaraei Mohammadreza
Rastas Iiro
Repo Liina
Ryan Yann Ciarán
Tiihonen Iiro Lassi Ilmari
Tolonen Mikko
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.Peer reviewe

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

Author: Babbar Rohit
Ginter Filip
Mäkelä Eetu
Qaraei Mohammedreza
Rastas Iiro
Repo Liina
Ryan Yann
Tiihonen Iiro
Tolonen Mikko
Publication venue: 'Indiana University Press (Project Muse)'
Publication date: 29/11/2022
Field of study

UTUPub