58 research outputs found

    Evaluation of Distributional Semantic Models of Ancient Greek:Preliminary Results and a Road Map for Future Work

    Get PDF
    We evaluate four count-based and predictive distributional semantic models of Ancient Greek against AGREE, a composite benchmark of human judgements, to assess their ability to retrieve semantic relatedness. On the basis of the observations deriving from the analysis of the results, we design a procedure for a largerscale intrinsic evaluation of count-based and predictive language models, including syntactic embeddings. We also propose possible ways of exploiting the different layers of the whole AGREE benchmark (including both humanand machine-generated data) and different evaluation metrics

    Evaluation of Distributional Semantic Models of Ancient Greek:Preliminary Results and a Road Map for Future Work

    Get PDF
    We evaluate four count-based and predictive distributional semantic models of Ancient Greek against AGREE, a composite benchmark of human judgements, to assess their ability to retrieve semantic relatedness. On the basis of the observations deriving from the analysis of the results, we design a procedure for a largerscale intrinsic evaluation of count-based and predictive language models, including syntactic embeddings. We also propose possible ways of exploiting the different layers of the whole AGREE benchmark (including both humanand machine-generated data) and different evaluation metrics

    Evaluation of Distributional Semantic Models of Ancient Greek:Preliminary Results and a Road Map for Future Work

    Get PDF
    We evaluate four count-based and predictive distributional semantic models of Ancient Greek against AGREE, a composite benchmark of human judgements, to assess their ability to retrieve semantic relatedness. On the basis of the observations deriving from the analysis of the results, we design a procedure for a largerscale intrinsic evaluation of count-based and predictive language models, including syntactic embeddings. We also propose possible ways of exploiting the different layers of the whole AGREE benchmark (including both humanand machine-generated data) and different evaluation metrics

    Evaluation of Distributional Semantic Models of Ancient Greek:Preliminary Results and a Road Map for Future Work

    Get PDF
    We evaluate four count-based and predictive distributional semantic models of Ancient Greek against AGREE, a composite benchmark of human judgements, to assess their ability to retrieve semantic relatedness. On the basis of the observations deriving from the analysis of the results, we design a procedure for a largerscale intrinsic evaluation of count-based and predictive language models, including syntactic embeddings. We also propose possible ways of exploiting the different layers of the whole AGREE benchmark (including both humanand machine-generated data) and different evaluation metrics

    "What is on your mind?" Automated Scoring of Mindreading in Childhood and Early Adolescence

    Full text link
    In this paper we present the first work on the automated scoring of mindreading ability in middle childhood and early adolescence. We create MIND-CA, a new corpus of 11,311 question-answer pairs in English from 1,066 children aged 7 to 14. We perform machine learning experiments and carry out extensive quantitative and qualitative evaluation. We obtain promising results, demonstrating the applicability of state-of-the-art NLP solutions to a new domain and task.Comment: Accepted in COLING 202

    Discovering multiword expressions

    Get PDF
    In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural lan- guage processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We con- centrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods

    English Machine Reading Comprehension Datasets: A Survey

    Get PDF
    This paper surveys 60 English Machine Reading Comprehension datasets, with a view to providing a convenient resource for other researchers interested in this problem. We categorize the datasets according to their question and answer form and compare them across various dimensions including size, vocabulary, data source, method of creation, human performance level, and first question word. Our analysis reveals that Wikipedia is by far the most common data source and that there is a relative lack of why, when, and where questions across datasets.Comment: Will appear at EMNLP 2021. Dataset survey paper: 9 pages, 5 figures, 2 tables + attachmen

    Creating expert knowledge by relying on language learners : a generic approach for mass-producing language resources by combining implicit crowdsourcing and language learning

    Get PDF
    We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of the generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.peer-reviewe

    Emotion Embeddings \unicode{x2014} Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets

    Full text link
    Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc. Similarly, the large variety of representation formats used in previous research to describe emotions (polarity scales, basic emotion categories, dimensional approaches, appraisal theory, etc.) have led to an ever proliferating diversity of datasets, predictive models, and software tools for emotion analysis. Because of these two distinct types of heterogeneity, at the expressional and representational level, there is a dire need to unify previous work on increasingly diverging data and label types. This article presents such a unifying computational model. We propose a training procedure that learns a shared latent representation for emotions, so-called emotion embeddings, independent of different natural languages, communication modalities, media or representation label formats, and even disparate model architectures. Experiments on a wide range of heterogeneous affective datasets indicate that this approach yields the desired interoperability for the sake of reusability, interpretability and flexibility, without penalizing prediction quality. Code and data are archived under https://doi.org/10.5281/zenodo.7405327 .Comment: 18 pages, 6 figure

    Extracting locations from sport and exercise-related social media messages using a neural network-based bilingual toponym recognition model

    Get PDF
    Sport and exercise contribute to health and well-being in cities. While previous research has mainly focused on activities at specific locations such as sport facilities, "informal sport" that occur at arbitrary locations across the city have been largely neglected. Such activities are more challenging to observe, but this challenge may be addressed using data collected from social media platforms, because social media users regularly generate content related to sports and exercise at given locations. This allows studying all sport, including those "informal sport" which are at arbitrary locations, to better understand sports and exercise-related activities in cities. However, user-generated geographical information available on social media platforms is becoming scarcer and coarser. This places increased emphasis on extracting location information from free-form text content on social media, which is complicated by multilingualism and informal language. To support this effort, this article presents an end-to-end deep learning-based bilingual toponym recognition model for extracting location information from social media content related to sports and exercise. We show that our approach outperforms five state-of-the-art deep learning and machine learning models. We further demonstrate how our model can be deployed in a geoparsing framework to support city planners in promoting healthy and active lifestyles.Peer reviewe
    corecore