633,562 research outputs found

    Type-driven natural language analysis

    Get PDF
    The purpose of this thesis is in showing how recent developments in logic programming can be exploited to encode in a computational environment the features of certain linguistic theories. We are in this way able to make available for the purpose of natural language processing sophisticated capabilities of linguistic analysis directly justified by well developed grammatical frameworks. More specifically, we exploit hypothetical reasoning, recently proposed as one of the possible directions to widen logic programming, to account for the syntax of filler-gap dependencies along the lines of linguistic theories such as Generalized Phrase Structure Grammar and Categorial Grammar. Moreover, we make use, for the purpose of semantic analysis of the same kind of phenomena, of another recently proposed extension, interestingly related to the previous one, namely the idea of replacing first-order terms with the more expressive λ-terms of λ-Calculus

    Extracting Narrative Patterns in Different Textual Genres: A Multilevel Feature Discourse Analysis

    Get PDF
    We present a data-driven approach to discover and extract patterns in textual genres with the aim of identifying whether there is an interesting variation of linguistic features among different narrative genres depending on their respective communicative purposes. We want to achieve this goal by performing a multilevel discourse analysis according to (1) the type of feature studied (shallow, syntactic, semantic, and discourse-related); (2) the texts at a document level; and (3) the textual genres of news, reviews, and children’s tales. To accomplish this, several corpora from the three textual genres were gathered from different sources to ensure a heterogeneous representation, paying attention to the presence and frequency of a series of features extracted with computational tools. This deep analysis aims at obtaining more detailed knowledge of the different linguistic phenomena that directly shape each of the genres included in the study, therefore showing the particularities that make them be considered as individual genres but also comprise them inside the narrative typology. The findings suggest that this type of multilevel linguistic analysis could be of great help for areas of research within natural language processing such as computational narratology, as they allow a better understanding of the fundamental features that define each genre and its communicative purpose. Likewise, this approach could also boost the creation of more consistent automatic story generation tools in areas of language generation.This research work is part of the R&D project “PID2021-123956OB-I00”, funded by MCIN/AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”. Moreover, it was also partially funded by the project “CLEAR.TEXT: Enhancing the modernization public sector organizations by deploying natural language processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), by the Generalitat Valenciana through the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation” with grant reference CIPROM/2021/21, and finally by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231)

    Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis

    Full text link
    We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users -- historical linguists, lexicographers, or social scientists -- to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the `definitions as representations' paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.Comment: ACL 202

    Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis

    Get PDF
    We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users — historical linguists, lexicographers, or social scientists — to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the ‘definitions as representations’ paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP

    A linguistically-driven methodology for detecting impending and unfolding emergencies from social media messages

    Get PDF
    Natural disasters have demonstrated the crucial role of social media before, during and after emergencies (Haddow & Haddow 2013). Within our EU project Sland \ub4 ail, we aim to ethically improve \ub4 the use of social media in enhancing the response of disaster-related agen-cies. To this end, we have collected corpora of social and formal media to study newsroom communication of emergency management organisations in English and Italian. Currently, emergency management agencies in English-speaking countries use social media in different measure and different degrees, whereas Italian National Protezione Civile only uses Twitter at the moment. Our method is developed with a view to identifying communicative strategies and detecting sentiment in order to distinguish warnings from actual disasters and major from minor disasters. Our linguistic analysis uses humans to classify alert/warning messages or emer-gency response and mitigation ones based on the terminology used and the sentiment expressed. Results of linguistic analysis are then used to train an application by tagging messages and detecting disaster- and/or emergency-related terminology and emotive language to simulate human rating and forward information to an emergency management system
    • 

    corecore