24,324 research outputs found
Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks
When constructing models that learn from noisy labels produced by multiple
annotators, it is important to accurately estimate the reliability of
annotators. Annotators may provide labels of inconsistent quality due to their
varying expertise and reliability in a domain. Previous studies have mostly
focused on estimating each annotator's overall reliability on the entire
annotation task. However, in practice, the reliability of an annotator may
depend on each specific instance. Only a limited number of studies have
investigated modelling per-instance reliability and these only considered
binary labels. In this paper, we propose an unsupervised model which can handle
both binary and multi-class labels. It can automatically estimate the
per-instance reliability of each annotator and the correct label for each
instance. We specify our model as a probabilistic model which incorporates
neural networks to model the dependency between latent variables and instances.
For evaluation, the proposed method is applied to both synthetic and real data,
including two labelling tasks: text classification and textual entailment.
Experimental results demonstrate our novel method can not only accurately
estimate the reliability of annotators across different instances, but also
achieve superior performance in predicting the correct labels and detecting the
least reliable annotators compared to state-of-the-art baselines.Comment: 9 pages, 1 figures, 10 tables, 2019 Annual Conference of the North
American Chapter of the Association for Computational Linguistics (NAACL2019
Active Discovery of Network Roles for Predicting the Classes of Network Nodes
Nodes in real world networks often have class labels, or underlying
attributes, that are related to the way in which they connect to other nodes.
Sometimes this relationship is simple, for instance nodes of the same class are
may be more likely to be connected. In other cases, however, this is not true,
and the way that nodes link in a network exhibits a different, more complex
relationship to their attributes. Here, we consider networks in which we know
how the nodes are connected, but we do not know the class labels of the nodes
or how class labels relate to the network links. We wish to identify the best
subset of nodes to label in order to learn this relationship between node
attributes and network links. We can then use this discovered relationship to
accurately predict the class labels of the rest of the network nodes.
We present a model that identifies groups of nodes with similar link
patterns, which we call network roles, using a generative blockmodel. The model
then predicts labels by learning the mapping from network roles to class labels
using a maximum margin classifier. We choose a subset of nodes to label
according to an iterative margin-based active learning strategy. By integrating
the discovery of network roles with the classifier optimisation, the active
learning process can adapt the network roles to better represent the network
for node classification. We demonstrate the model by exploring a selection of
real world networks, including a marine food web and a network of English
words. We show that, in contrast to other network classifiers, this model
achieves good classification accuracy for a range of networks with different
relationships between class labels and network links
Entropy-based feature extraction for electromagnetic discharges classification in high-voltage power generation
This work exploits four entropy measures known as Sample, Permutation, Weighted Permutation, and Dispersion Entropy to extract relevant information from Electromagnetic Interference (EMI) discharge signals that are useful in fault diagnosis of High-Voltage (HV) equipment. Multi-class classification algorithms are used to classify or distinguish between various discharge sources such as Partial Discharges (PD), Exciter, Arcing, micro Sparking and Random Noise. The signals were measured and recorded on different sites followed by EMI expert’s data analysis in order to identify and label the discharge source type contained within the signal. The classification was performed both within each site and across all sites. The system performs well for both cases with extremely high classification accuracy within site. This work demonstrates the ability to extract relevant entropy-based features from EMI discharge sources from time-resolved signals requiring minimal computation making the system ideal for a potential application to online condition monitoring based on EMI
Zero-Shot Learning for Semantic Utterance Classification
We propose a novel zero-shot learning method for semantic utterance
classification (SUC). It learns a classifier for problems where
none of the semantic categories are present in the training set. The
framework uncovers the link between categories and utterances using a semantic
space. We show that this semantic space can be learned by deep neural networks
trained on large amounts of search engine query log data. More precisely, we
propose a novel method that can learn discriminative semantic features without
supervision. It uses the zero-shot learning framework to guide the learning of
the semantic features. We demonstrate the effectiveness of the zero-shot
semantic learning algorithm on the SUC dataset collected by (Tur, 2012).
Furthermore, we achieve state-of-the-art results by combining the semantic
features with a supervised method
- …