1,848 research outputs found
OpinionRank: Extracting Ground Truth Labels from Unreliable Expert Opinions with Graph-Based Spectral Ranking
As larger and more comprehensive datasets become standard in contemporary machine learning, it becomes increasingly more difficult to obtain reliable, trustworthy label information with which to train sophisticated models. To address this problem, crowdsourcing has emerged as a popular, inexpensive, and efficient data mining solution for performing distributed label collection. However, crowdsourced annotations are inherently untrustworthy, as the labels are provided by anonymous volunteers who may have varying, unreliable expertise. Worse yet, some participants on commonly used platforms such as Amazon Mechanical Turk may be adversarial, and provide intentionally incorrect label information without the end user\u27s knowledge. We discuss three conventional models of the label generation process, describing their parameterizations and the model-based approaches used to solve them. We then propose OpinionRank, a model-free, interpretable, graph-based spectral algorithm for integrating crowdsourced annotations into reliable labels for performing supervised or semi-supervised learning. Our experiments show that OpinionRank performs favorably when compared against more highly parameterized algorithms. We also show that OpinionRank is scalable to very large datasets and numbers of label sources, and requires considerably fewer computational resources than previous approaches
A finite mixture modelling perspective for combining experts’ opinions with an application to quantile-based risk measures
The key purpose of this paper is to present an alternative viewpoint for combining expert opinions based on finite mixture models. Moreover, we consider that the components of the mixture are not necessarily assumed to be from the same parametric family. This approach can enable the agent to make informed decisions about the uncertain quantity of interest in a flexible manner that accounts for multiple sources of heterogeneity involved in the opinions expressed by the experts in terms of the parametric family, the parameters of each component density, and also the mixing weights. Finally, the proposed models are employed for numerically computing quantile-based risk measures in a collective decision making context
Coordinated Multi-Agent Imitation Learning
We study the problem of imitation learning from demonstrations of multiple
coordinating agents. One key challenge in this setting is that learning a good
model of coordination can be difficult, since coordination is often implicit in
the demonstrations and must be inferred as a latent variable. We propose a
joint approach that simultaneously learns a latent coordination model along
with the individual policies. In particular, our method integrates unsupervised
structure learning with conventional imitation learning. We illustrate the
power of our approach on a difficult problem of learning multiple policies for
fine-grained behavior modeling in team sports, where different players occupy
different roles in the coordinated team strategy. We show that having a
coordination model to infer the roles of players yields substantially improved
imitation loss compared to conventional baselines.Comment: International Conference on Machine Learning 201
A Bayesian Approach for Sequence Tagging with Crowds
Current methods for sequence tagging, a core task in NLP, are data hungry,
which motivates the use of crowdsourcing as a cheap way to obtain labelled
data. However, annotators are often unreliable and current aggregation methods
cannot capture common types of span annotation errors. To address this, we
propose a Bayesian method for aggregating sequence tags that reduces errors by
modelling sequential dependencies between the annotations as well as the
ground-truth labels. By taking a Bayesian approach, we account for uncertainty
in the model due to both annotator errors and the lack of data for modelling
annotators who complete few tasks. We evaluate our model on crowdsourced data
for named entity recognition, information extraction and argument mining,
showing that our sequential model outperforms the previous state of the art. We
also find that our approach can reduce crowdsourcing costs through more
effective active learning, as it better captures uncertainty in the sequence
labels when there are few annotations.Comment: Accepted for EMNLP 201
- …