41,906 research outputs found
Reduced perplexity: Uncertainty measures without entropy
Conference paper presented at Recent Advances in Info-Metrics, Washington, DC, 2014. Under review for a book chapter in "Recent innovations in info-metrics: a cross-disciplinary perspective on information and information processing" by Oxford University Press.A simple, intuitive approach to the assessment of probabilistic inferences is introduced. The Shannon information metrics are translated to the probability domain. The translation shows that the negative logarithmic score and the geometric mean are equivalent measures of the accuracy of a probabilistic inference. Thus there is both a quantitative reduction in perplexity as good inference algorithms reduce the uncertainty and a qualitative reduction due to the increased clarity between the original set of inferences and their average, the geometric mean. Further insight is provided by showing that the Renyi and Tsallis entropy functions translated to the probability domain are both the weighted generalized mean of the distribution. The generalized mean of probabilistic inferences forms a Risk Profile of the performance. The arithmetic mean is used to measure the decisiveness, while the -2/3 mean is used to measure the robustness
Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017
Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work.
In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland).
The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success
Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks
When constructing models that learn from noisy labels produced by multiple
annotators, it is important to accurately estimate the reliability of
annotators. Annotators may provide labels of inconsistent quality due to their
varying expertise and reliability in a domain. Previous studies have mostly
focused on estimating each annotator's overall reliability on the entire
annotation task. However, in practice, the reliability of an annotator may
depend on each specific instance. Only a limited number of studies have
investigated modelling per-instance reliability and these only considered
binary labels. In this paper, we propose an unsupervised model which can handle
both binary and multi-class labels. It can automatically estimate the
per-instance reliability of each annotator and the correct label for each
instance. We specify our model as a probabilistic model which incorporates
neural networks to model the dependency between latent variables and instances.
For evaluation, the proposed method is applied to both synthetic and real data,
including two labelling tasks: text classification and textual entailment.
Experimental results demonstrate our novel method can not only accurately
estimate the reliability of annotators across different instances, but also
achieve superior performance in predicting the correct labels and detecting the
least reliable annotators compared to state-of-the-art baselines.Comment: 9 pages, 1 figures, 10 tables, 2019 Annual Conference of the North
American Chapter of the Association for Computational Linguistics (NAACL2019
Active Discovery of Network Roles for Predicting the Classes of Network Nodes
Nodes in real world networks often have class labels, or underlying
attributes, that are related to the way in which they connect to other nodes.
Sometimes this relationship is simple, for instance nodes of the same class are
may be more likely to be connected. In other cases, however, this is not true,
and the way that nodes link in a network exhibits a different, more complex
relationship to their attributes. Here, we consider networks in which we know
how the nodes are connected, but we do not know the class labels of the nodes
or how class labels relate to the network links. We wish to identify the best
subset of nodes to label in order to learn this relationship between node
attributes and network links. We can then use this discovered relationship to
accurately predict the class labels of the rest of the network nodes.
We present a model that identifies groups of nodes with similar link
patterns, which we call network roles, using a generative blockmodel. The model
then predicts labels by learning the mapping from network roles to class labels
using a maximum margin classifier. We choose a subset of nodes to label
according to an iterative margin-based active learning strategy. By integrating
the discovery of network roles with the classifier optimisation, the active
learning process can adapt the network roles to better represent the network
for node classification. We demonstrate the model by exploring a selection of
real world networks, including a marine food web and a network of English
words. We show that, in contrast to other network classifiers, this model
achieves good classification accuracy for a range of networks with different
relationships between class labels and network links
- …