7,699 research outputs found
Bandit-Based Task Assignment for Heterogeneous Crowdsourcing
We consider a task assignment problem in crowdsourcing, which is aimed at
collecting as many reliable labels as possible within a limited budget. A
challenge in this scenario is how to cope with the diversity of tasks and the
task-dependent reliability of workers, e.g., a worker may be good at
recognizing the name of sports teams, but not be familiar with cosmetics
brands. We refer to this practical setting as heterogeneous crowdsourcing. In
this paper, we propose a contextual bandit formulation for task assignment in
heterogeneous crowdsourcing, which is able to deal with the
exploration-exploitation trade-off in worker selection. We also theoretically
investigate the regret bounds for the proposed method, and demonstrate its
practical usefulness experimentally
Using machine-learning to assign function labels to parser output for Spanish
Data-driven grammatical function tag assignment has been studied for English using the Penn-II Treebank data. In this paper we address the question of whether such methods can be applied successfully to other languages and treebank resources. In addition to tag assignment accuracy
and f-scores we also present results of a task-based evaluation. We use three machine-learning methods to assign
Cast3LB function tags to sentences parsed with Bikelās parser trained on the Cast3LB treebank. The best performing method, SVM, achieves an f-score of 86.87% on gold-standard trees and 66.67% on parser output - a statistically significant improvement of 6.74% over the baseline. In a
task-based evaluation we generate LFG functional-structures from the function tag-enriched trees. On this task we achive
an f-score of 75.67%, a statistically significant 3.4% improvement over the baseline
Econometrics meets sentiment : an overview of methodology and applications
The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software
Enhancing Sentiment Analysis Results through Outlier Detection Optimization
When dealing with text data containing subjective labels like speaker
emotions, inaccuracies or discrepancies among labelers are not uncommon. Such
discrepancies can significantly affect the performance of machine learning
algorithms. This study investigates the potential of identifying and addressing
outliers in text data with subjective labels, aiming to enhance classification
outcomes. We utilized the Deep SVDD algorithm, a one-class classification
method, to detect outliers in nine text-based emotion and sentiment analysis
datasets. By employing both a small-sized language model (DistilBERT base model
with 66 million parameters) and non-deep learning machine learning algorithms
(decision tree, KNN, Logistic Regression, and LDA) as the classifier, our
findings suggest that the removal of outliers can lead to enhanced results in
most cases. Additionally, as outliers in such datasets are not necessarily
unlearnable, we experienced utilizing a large language model -- DeBERTa v3
large with 131 million parameters, which can capture very complex patterns in
data. We continued to observe performance enhancements across multiple
datasets.Comment: 11 pages, 5 figure
- ā¦