1,032 research outputs found

    Diatopic variation in digital space: What Twitter can tell us about Texas English dialect areas

    Get PDF
    The availability of large amounts of social media text offers tremendous potential for studies of diatopic variation. A case in point is the linguistic geography of Texas, which is at present insufficiently described in traditional dialectological research. This paper summarises previous work on diatopic variation in Texas English on the basis of Twitter and presents an approach that foregrounds functional interpretability over a maximally clear geographical signal. In a multi-dimensional analysis based on 45 linguistic features in over 3 million tweets from across the state, two dimensions of variation are identified that pattern in geographically meaningful ways. The first of these relates to creative uses of typography and distinguishes urban centres from the rest of the state. The second dimension encompasses characteristics of interpersonal, spoken discourse and shows an East-West geographical divide. While the linguistic features of relevance for the dimensions are not generally considered in dialectological research, their geographic patterning reflects major tendencies attested in the literature on diatopic variation in Texas.[1]   [1]I am grateful to Alex Rosenfeld for sharing his data with me. This work was initially presented at a panel on Twitter in sociolinguistic research at NWAV 49, organised by Stef Grondelaers and Jane Stuart-Smith. I would like to thank both of them for giving me this opportunity and the attendees of the panel, especially Lars Hinrichs and Alex Rosenfeld, for fruitful discussion. Finally, my gratitude goes to Erling Strudsholm and Anita Berit Hansen for their invitation to participate on the Coseriu Symposium and their patience in organising this special issue

    You Shall Know a User by the Company It Keeps: Dynamic Representations for Social Media Users in NLP

    Full text link
    Information about individuals can help to better understand what they say, particularly in social media where texts are short. Current approaches to modelling social media users pay attention to their social connections, but exploit this information in a static way, treating all connections uniformly. This ignores the fact, well known in sociolinguistics, that an individual may be part of several communities which are not equally relevant in all communicative situations. We present a model based on Graph Attention Networks that captures this observation. It dynamically explores the social graph of a user, computes a user representation given the most relevant connections for a target task, and combines it with linguistic information to make a prediction. We apply our model to three different tasks, evaluate it against alternative models, and analyse the results extensively, showing that it significantly outperforms other current methods.Comment: To appear in Proceeding of EMNLP 201

    Stability of Syntactic Dialect Classification Over Space and Time

    Get PDF
    This paper analyses the degree to which dialect classifiers based on syntactic representations remain stable over space and time. While previous work has shown that the combination of grammar induction and geospatial text classification produces robust dialect models, we do not know what influence both changing grammars and changing populations have on dialect models. This paper constructs a test set for 12 dialects of English that spans three years at monthly intervals with a fixed spatial distribution across 1,120 cities. Syntactic representations are formulated within the usage-based Construction Grammar paradigm (CxG). The decay rate of classification performance for each dialect over time allows us to identify regions undergoing syntactic change. And the distribution of classification accuracy within dialect regions allows us to identify the degree to which the grammar of a dialect is internally heterogeneous. The main contribution of this paper is to show that a rigorous evaluation of dialect classification models can be used to find both variation over space and change over time

    Experiments in Language Variety Geolocation and Dialect Identification

    Get PDF
    Peer reviewe

    Addressing flexibility in energy system models

    Get PDF
    The present report summarises the discussions and conclusions of the international workshop on "Addressing flexibility in energy system models" held on December 4 and 5 2014 at the premises of the JRC Institute for Energy and Transport in Petten. Around 40 energy modelling experts and researchers from universities, research centres, the power industry, international organisations, and the European Commission (DGs ENER and JRC) met to present and discuss their views on the modelling of flexibility issues, the linkage of energy system models and sector-detailed energy models, the integration of high shares of variable renewable energy sources, and the representation of flexibility needs in power system models. The discussions took into account modelling and data-related methodological aspects, with their limitations and uncertainties, as well as possible alternatives to be implemented within energy system models.JRC.F.6-Energy Technology Policy Outloo

    Contextualized Diachronic Word Representations

    Get PDF
    International audienceDiachronic word embeddings play a key role in capturing interesting patterns about how language evolves over time. Most of the existing work focuses on studying corpora spanning across several decades, which is understandably still not a possibility when working on social media-based user-generated content. In this work, we address the problem of studying semantic changes in a large Twitter corpus collected over five years, a much shorter period than what is usually the norm in di-achronic studies. We devise a novel attentional model, based on Bernoulli word embeddings, that are conditioned on contextual extra-linguistic (social) features such as network, spatial and socioeconomic variables, which are associated with Twitter users, as well as topic-based features. We posit that these social features provide an inductive bias that helps our model to overcome the narrow time-span regime problem. Our extensive experiments reveal that our proposed model is able to capture subtle semantic shifts without being biased towards frequency cues and also works well when certain con-textual features are absent. Our model fits the data better than current state-of-the-art dynamic word embedding models and therefore is a promising tool to study diachronic semantic changes over small time periods
    corecore