Search CORE

10 research outputs found

Nonparametric Uncertainty Quantification for Single Deterministic Neural Network

Author: Artemenkov Aleksandr
Fedyanin Kirill
Fishkov Alexander
Kotelevskii Nikita
Noskov Fedor
Panov Maxim
Petiushko Aleksandr
Shelmanov Artem
Vazhentsev Artem
Publication venue
Publication date: 27/10/2022
Field of study

This paper proposes a fast and scalable method for uncertainty quantification of machine learning models' predictions. First, we show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution. Importantly, the proposed approach allows to disentangle explicitly aleatoric and epistemic uncertainties. The resulting method works directly in the feature space. However, one can apply it to any neural network by considering an embedding of the data induced by the network. We demonstrate the strong performance of the method in uncertainty estimation tasks on text classification problems and a variety of real-world image datasets, such as MNIST, SVHN, CIFAR-100 and several versions of ImageNet.Comment: NeurIPS 2022 pape

arXiv.org e-Print Archive

Towards Computationally Feasible Deep Active Learning

Author: Avetisian Manvel
Gusev Gleb
Kuzmin Gleb
Larionov Daniil
Sanochkin Leonid
Shelmanov Artem
Tsvigun Akim
Zhukov Leonid
Publication venue
Publication date: 07/05/2022
Field of study

Active learning (AL) is a prominent technique for reducing the annotation effort required for training machine learning models. Deep learning offers a solution for several essential obstacles to deploying AL in practice but introduces many others. One of such problems is the excessive computational resources required to train an acquisition model and estimate its uncertainty on instances in the unlabeled pool. We propose two techniques that tackle this issue for text classification and tagging tasks, offering a substantial reduction of AL iteration duration and the computational overhead introduced by deep acquisition models in AL. We also demonstrate that our algorithm that leverages pseudo-labeling and distilled models overcomes one of the essential obstacles revealed previously in the literature. Namely, it was shown that due to differences between an acquisition model used to select instances during AL and a successor model trained on the labeled data, the benefits of AL can diminish. We show that our algorithm, despite using a smaller and faster acquisition model, is capable of training a more expressive successor model with higher performance.Comment: Accepted at NAACL-2022 Finding

arXiv.org e-Print Archive

LM-Polygraph: Uncertainty Estimation for Language Models

Author: Baldwin Timothy
Fadeeva Ekaterina
Fedyanin Kirill
Goncharova Elizaveta
Panchenko Alexander
Panov Maxim
Petrakov Sergey
Shelmanov Artem
Tsvigun Akim
Vashurin Roman
Vasilev Daniil
Vazhentsev Artem
Publication venue
Publication date: 13/11/2023
Field of study

Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields. However, a significant challenge arises as these models often "hallucinate", i.e., fabricate facts without providing users an apparent means to discern the veracity of their statements. Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of LLMs. However, to date, research on UE methods for LLMs has been focused primarily on theoretical rather than engineering contributions. In this work, we tackle this issue by introducing LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. Additionally, it introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores, empowering end-users to discern unreliable responses. LM-Polygraph is compatible with the most recent LLMs, including BLOOMz, LLaMA-2, ChatGPT, and GPT-4, and is designed to support future releases of similarly-styled LMs.Comment: Accepted at EMNLP-202

arXiv.org e-Print Archive

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

Author: Afzal Osama Mohammed
Aji Alham Fikri
Ivanov Petar
Mahmoud Tarek
Mansurov Jonibek
Nakov Preslav
Shelmanov Artem
Su Jinyan
Tsvigun Akim
Wang Yuxia
Whitehouse Chenxi
Publication venue
Publication date: 24/05/2023
Field of study

Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries, but this has also resulted in concerns regarding the potential misuse of such texts in journalism, educational, and academic context. In this work, we aim to develop automatic systems to identify machine-generated text and to detect potential misuse. We first introduce a large-scale benchmark M4, which is multi-generator, multi-domain, and multi-lingual corpus for machine-generated text detection. Using the dataset, we experiment with a number of methods and we show that it is challenging for detectors to generalize well on unseen examples if they are either from different domains or are generated by different large language models. In such cases, detectors tend to misclassify machine-generated text as human-written. These results show that the problem is far from solved and there is a lot of room for improvement. We believe that our dataset M4, which covers different generators, domains and languages, will enable future research towards more robust approaches for this pressing societal problem. The M4 dataset is available at https://github.com/mbzuai-nlp/M4.Comment: 11 page

arXiv.org e-Print Archive

Towards Text Processing System for Emergency Event Detection in the Arctic Zone

Author: Deviatkin Dmitriy
Shelmanov Artem
Publication venue: ФИЦ ИУ РАН
Publication date
Field of study

We present the ongoing work on text processing system for detection and analysis of events related to emergencies in the Arctic zone. The peculiarity of the task consists in data sparseness and scarceness of tools / language resources for processing such specific texts. The system performs focused crawling of documents related to emergencies in the Arctic region, text parsing including named entity recognition and geotagging, and indexing texts with their metadata for faceted search. The system aims at processing both English and Russian text messages and documents. We report the preliminary results of the experimental evaluation of the system components on Twitter data

RUDN Repository

Active learning with deep pre-trained models for sequence tagging of clinical and biomedical texts

Author: Dylov Dmitry
Fedulova Irina
Khromov Nikita
Kireev Danil
Liventsev Vadim
Panchenko Alexander
Shelmanov Artem
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 06/02/2020
Field of study

Active learning is a technique that helps to minimize the annotation budget required for the creation of a labeled dataset while maximizing the performance of a model trained on this dataset. It has been shown that active learning can be successfully applied to sequence tagging tasks of text processing in conjunction with deep learning models even when a limited amount of labeled data is available. Recent advances in transfer learning methods for natural language processing based on deep pre-trained models such as ELMo and BERT offer a much better ability to generalize on small annotated datasets compared to their shallow counterparts. The combination of deep pre-trained models and active learning leads to a powerful approach to dealing with annotation scarcity. In this work, we investigate the potential of this approach on clinical and biomedical data. The experimental evaluation shows that the combination of active learning and deep pre-trained models outperforms the standard methods of active learning. We also suggest a modification to a standard uncertainty sampling strategy and empirically show that it could be beneficial for annotation of very skewed datasets. Finally, we propose an annotation tool empowered with active learning and deep pre-trained models that could be used for entity annotation directly from Jupyter IDE

Crossref

Pure OAI Repository

Active learning with deep pre-trained models for sequence tagging of clinical and biomedical texts

Author: Bi Jinbo
Dylov Dmitry
Fedulova Irina
Hu Xiaohua Tony
Khromov Nikita
Kireev Danil
Liventsev Vadim
Panchenko Alexander
Shelmanov Artem
Yoo Illhoi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/02/2020
Field of study