7,699 research outputs found

    Bandit-Based Task Assignment for Heterogeneous Crowdsourcing

    Full text link
    We consider a task assignment problem in crowdsourcing, which is aimed at collecting as many reliable labels as possible within a limited budget. A challenge in this scenario is how to cope with the diversity of tasks and the task-dependent reliability of workers, e.g., a worker may be good at recognizing the name of sports teams, but not be familiar with cosmetics brands. We refer to this practical setting as heterogeneous crowdsourcing. In this paper, we propose a contextual bandit formulation for task assignment in heterogeneous crowdsourcing, which is able to deal with the exploration-exploitation trade-off in worker selection. We also theoretically investigate the regret bounds for the proposed method, and demonstrate its practical usefulness experimentally

    Using machine-learning to assign function labels to parser output for Spanish

    Get PDF
    Data-driven grammatical function tag assignment has been studied for English using the Penn-II Treebank data. In this paper we address the question of whether such methods can be applied successfully to other languages and treebank resources. In addition to tag assignment accuracy and f-scores we also present results of a task-based evaluation. We use three machine-learning methods to assign Cast3LB function tags to sentences parsed with Bikelā€™s parser trained on the Cast3LB treebank. The best performing method, SVM, achieves an f-score of 86.87% on gold-standard trees and 66.67% on parser output - a statistically significant improvement of 6.74% over the baseline. In a task-based evaluation we generate LFG functional-structures from the function tag-enriched trees. On this task we achive an f-score of 75.67%, a statistically significant 3.4% improvement over the baseline

    Econometrics meets sentiment : an overview of methodology and applications

    Get PDF
    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    Enhancing Sentiment Analysis Results through Outlier Detection Optimization

    Full text link
    When dealing with text data containing subjective labels like speaker emotions, inaccuracies or discrepancies among labelers are not uncommon. Such discrepancies can significantly affect the performance of machine learning algorithms. This study investigates the potential of identifying and addressing outliers in text data with subjective labels, aiming to enhance classification outcomes. We utilized the Deep SVDD algorithm, a one-class classification method, to detect outliers in nine text-based emotion and sentiment analysis datasets. By employing both a small-sized language model (DistilBERT base model with 66 million parameters) and non-deep learning machine learning algorithms (decision tree, KNN, Logistic Regression, and LDA) as the classifier, our findings suggest that the removal of outliers can lead to enhanced results in most cases. Additionally, as outliers in such datasets are not necessarily unlearnable, we experienced utilizing a large language model -- DeBERTa v3 large with 131 million parameters, which can capture very complex patterns in data. We continued to observe performance enhancements across multiple datasets.Comment: 11 pages, 5 figure
    • ā€¦
    corecore