1,204 research outputs found

    Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks

    Full text link
    When constructing models that learn from noisy labels produced by multiple annotators, it is important to accurately estimate the reliability of annotators. Annotators may provide labels of inconsistent quality due to their varying expertise and reliability in a domain. Previous studies have mostly focused on estimating each annotator's overall reliability on the entire annotation task. However, in practice, the reliability of an annotator may depend on each specific instance. Only a limited number of studies have investigated modelling per-instance reliability and these only considered binary labels. In this paper, we propose an unsupervised model which can handle both binary and multi-class labels. It can automatically estimate the per-instance reliability of each annotator and the correct label for each instance. We specify our model as a probabilistic model which incorporates neural networks to model the dependency between latent variables and instances. For evaluation, the proposed method is applied to both synthetic and real data, including two labelling tasks: text classification and textual entailment. Experimental results demonstrate our novel method can not only accurately estimate the reliability of annotators across different instances, but also achieve superior performance in predicting the correct labels and detecting the least reliable annotators compared to state-of-the-art baselines.Comment: 9 pages, 1 figures, 10 tables, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL2019

    Human-in-the-Loop Learning From Crowdsourcing and Social Media

    Get PDF
    Computational social studies using public social media data have become more and more popular because of the large amount of user-generated data available. The richness of social media data, coupled with noise and subjectivity, raise significant challenges for computationally studying social issues in a feasible and scalable manner. Machine learning problems are, as a result, often subjective or ambiguous when humans are involved. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. When building supervised learning models, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This inevitably hides a rich source of diversity and subjectivity of opinions about the labels. Label distribution learning associates for each data item a probability distribution over the labels for that item, thus it can preserve diversities of opinions, beliefs, etc. that conventional learning hides or ignores. We propose a humans-in-the-loop learning framework to model and study large volumes of unlabeled subjective social media data with less human effort. We study various annotation tasks given to crowdsourced annotators and methods for aggregating their contributions in a manner that preserves subjectivity and disagreement. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. We conduct experiments using our learning framework on data related to two subjective social issues (work and employment, and suicide prevention) that touch many people worldwide. Our methods can be applied to a broad variety of problems, particularly social problems. Our experimental results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level

    Efficient Learning Framework for Training Deep Learning Models with Limited Supervision

    Get PDF
    In recent years, deep learning has shown tremendous success in different applications, however these modes mostly need a large labeled dataset for training their parameters. In this work, we aim to explore the potentials of efficient learning frameworks for training deep models on different problems in the case of limited supervision or noisy labels. For the image clustering problem, we introduce a new deep convolutional autoencoder with an unsupervised learning framework. We employ a relative entropy minimization as the clustering objective regularized by the frequency of cluster assignments and a reconstruction loss. In the case of noisy labels obtained by crowdsourcing platforms, we proposed a novel deep hybrid model for sentiment analysis of text data like tweets based on noisy crowd labels. The proposed model consists of a crowdsourcing aggregation model and a deep text autoencoder. We combine these sub-models based on a probabilistic framework rather than a heuristic way, and derive an efficient optimization algorithm to jointly solve the corresponding problem. In order to improve the performance of unsupervised deep hash functions on image similarity search in big datasets, we adopt generative adversarial networks to propose a new deep image retrieval model, where the adversarial loss is employed as a data-dependent regularization in our objective function. We also introduce a balanced self-paced learning algorithm for training a GAN-based model for image clustering, where the input samples are gradually included into training from easy to difficult, while the diversity of selected samples from all clusters are also considered. In addition, we explore adopting discriminative approaches for unsupervised visual representation learning rather than the generative algorithms, such as maximizing the mutual information between an input image and its representation and a contrastive loss for decreasing the distance between the representations of original and augmented image data

    A critical look at the evaluation of GNNs under heterophily: Are we really making progress?

    Full text link
    Node classification is a classical graph machine learning task on which Graph Neural Networks (GNNs) have recently achieved strong results. However, it is often believed that standard GNNs only work well for homophilous graphs, i.e., graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous, and it is typically assumed that specialized methods are required to achieve strong performance on such graphs. In this work, we challenge this assumption. First, we show that the standard datasets used for evaluating heterophily-specific models have serious drawbacks, making results obtained by using them unreliable. The most significant of these drawbacks is the presence of a large number of duplicate nodes in the datasets Squirrel and Chameleon, which leads to train-test data leakage. We show that removing duplicate nodes strongly affects GNN performance on these datasets. Then, we propose a set of heterophilous graphs of varying properties that we believe can serve as a better benchmark for evaluating the performance of GNNs under heterophily. We show that standard GNNs achieve strong results on these heterophilous graphs, almost always outperforming specialized models. Our datasets and the code for reproducing our experiments are available at https://github.com/yandex-research/heterophilous-graph

    Sentiment Analysis of Persian Language: Review of Algorithms, Approaches and Datasets

    Full text link
    Sentiment analysis aims to extract people's emotions and opinion from their comments on the web. It widely used in businesses to detect sentiment in social data, gauge brand reputation, and understand customers. Most of articles in this area have concentrated on the English language whereas there are limited resources for Persian language. In this review paper, recent published articles between 2018 and 2022 in sentiment analysis in Persian Language have been collected and their methods, approach and dataset will be explained and analyzed. Almost all the methods used to solve sentiment analysis are machine learning and deep learning. The purpose of this paper is to examine 40 different approach sentiment analysis in the Persian Language, analysis datasets along with the accuracy of the algorithms applied to them and also review strengths and weaknesses of each. Among all the methods, transformers such as BERT and RNN Neural Networks such as LSTM and Bi-LSTM have achieved higher accuracy in the sentiment analysis. In addition to the methods and approaches, the datasets reviewed are listed between 2018 and 2022 and information about each dataset and its details are provided

    Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

    Full text link
    There are two approaches for pairwise sentence scoring: Cross-encoders, which perform full-attention over the input pair, and Bi-encoders, which map each input independently to a dense vector space. While cross-encoders often achieve higher performance, they are too slow for many practical use cases. Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance. We present a simple yet efficient data augmentation strategy called Augmented SBERT, where we use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method. We evaluate our approach on multiple tasks (in-domain) as well as on a domain adaptation task. Augmented SBERT achieves an improvement of up to 6 points for in-domain and of up to 37 points for domain adaptation tasks compared to the original bi-encoder performance.Comment: Accepted at NAACL 202
    • …
    corecore