41,906 research outputs found

    Reduced perplexity: Uncertainty measures without entropy

    Full text link
    Conference paper presented at Recent Advances in Info-Metrics, Washington, DC, 2014. Under review for a book chapter in "Recent innovations in info-metrics: a cross-disciplinary perspective on information and information processing" by Oxford University Press.A simple, intuitive approach to the assessment of probabilistic inferences is introduced. The Shannon information metrics are translated to the probability domain. The translation shows that the negative logarithmic score and the geometric mean are equivalent measures of the accuracy of a probabilistic inference. Thus there is both a quantitative reduction in perplexity as good inference algorithms reduce the uncertainty and a qualitative reduction due to the increased clarity between the original set of inferences and their average, the geometric mean. Further insight is provided by showing that the Renyi and Tsallis entropy functions translated to the probability domain are both the weighted generalized mean of the distribution. The generalized mean of probabilistic inferences forms a Risk Profile of the performance. The arithmetic mean is used to measure the decisiveness, while the -2/3 mean is used to measure the robustness

    Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017

    Get PDF
    Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work. In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland). The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success

    Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks

    Full text link
    When constructing models that learn from noisy labels produced by multiple annotators, it is important to accurately estimate the reliability of annotators. Annotators may provide labels of inconsistent quality due to their varying expertise and reliability in a domain. Previous studies have mostly focused on estimating each annotator's overall reliability on the entire annotation task. However, in practice, the reliability of an annotator may depend on each specific instance. Only a limited number of studies have investigated modelling per-instance reliability and these only considered binary labels. In this paper, we propose an unsupervised model which can handle both binary and multi-class labels. It can automatically estimate the per-instance reliability of each annotator and the correct label for each instance. We specify our model as a probabilistic model which incorporates neural networks to model the dependency between latent variables and instances. For evaluation, the proposed method is applied to both synthetic and real data, including two labelling tasks: text classification and textual entailment. Experimental results demonstrate our novel method can not only accurately estimate the reliability of annotators across different instances, but also achieve superior performance in predicting the correct labels and detecting the least reliable annotators compared to state-of-the-art baselines.Comment: 9 pages, 1 figures, 10 tables, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL2019

    Active Discovery of Network Roles for Predicting the Classes of Network Nodes

    Full text link
    Nodes in real world networks often have class labels, or underlying attributes, that are related to the way in which they connect to other nodes. Sometimes this relationship is simple, for instance nodes of the same class are may be more likely to be connected. In other cases, however, this is not true, and the way that nodes link in a network exhibits a different, more complex relationship to their attributes. Here, we consider networks in which we know how the nodes are connected, but we do not know the class labels of the nodes or how class labels relate to the network links. We wish to identify the best subset of nodes to label in order to learn this relationship between node attributes and network links. We can then use this discovered relationship to accurately predict the class labels of the rest of the network nodes. We present a model that identifies groups of nodes with similar link patterns, which we call network roles, using a generative blockmodel. The model then predicts labels by learning the mapping from network roles to class labels using a maximum margin classifier. We choose a subset of nodes to label according to an iterative margin-based active learning strategy. By integrating the discovery of network roles with the classifier optimisation, the active learning process can adapt the network roles to better represent the network for node classification. We demonstrate the model by exploring a selection of real world networks, including a marine food web and a network of English words. We show that, in contrast to other network classifiers, this model achieves good classification accuracy for a range of networks with different relationships between class labels and network links
    • …
    corecore