114 research outputs found

    Register variation explains stylometric authorship analysis

    Get PDF

    A statistical comparison of regional phonetic and lexical variation in American English

    Get PDF
    This paper presents a statistical comparison of regional phonetic and lexical variation in American English. Both the phonetic and lexical datasets were first subjected to separate multivariate spatial analyses in order to identify the most common dimensions of spatial clustering in these two datasets. The dimensions of phonetic and lexical variation extracted by these two analyses were then correlated with each other, after being interpolated over a shared set of reference locations, in order to measure the similarity of regional phonetic and lexical variation in American English. This analysis shows that regional phonetic and lexical variation are remarkably similar in Modern American English

    The Language of Fake News

    Get PDF

    Stylistic variation on the Donald Trump Twitter account:a linguistic analysis of tweets posted between 2009 and 2018

    Get PDF
    Twitter was an integral part of Donald Trump's communication platform during his 2016 campaign. Although its topical content has been examined by researchers and the media, we know relatively little about the style of the language used on the account or how this style changed over time. In this study, we present the first detailed description of stylistic variation on the Trump Twitter account based on a multivariate analysis of grammatical co-occurrence patterns in tweets posted between 2009 and 2018. We identify four general patterns of stylistic variation, which we interpret as representing the degree of conversational, campaigning, engaged, and advisory discourse. We then track how the use of these four styles changed over time, focusing on the period around the campaign, showing that the style of tweets shifts systematically depending on the communicative goals of Trump and his team. Based on these results, we propose a series of hypotheses about how the Trump campaign used social media during the 2016 elections

    A statistical method for the identification and aggregation of regional linguistic variation

    Get PDF
    This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States

    Using social media to infer the diffusion of an urban contact dialect:A case study of multicultural London English

    Get PDF
    Sociolinguistic research has demonstrated that ‘urban contact dialects’ tend to diffuse beyond the speech communities in which they first emerge. However, no research has attempted to explore the distribution of these varieties across an entire nation nor isolate the social mechanisms that propel their spread. In this paper, we use a corpus of 1.8 billion geo-tagged tweets to explore the spread of Multicultural London English lexis across the UK. We find evidence for the diffusion of MLE lexis from East and North London into other ethnically and culturally diverse urban centres across England particularly those in the South (e.g., Luton), but find lower frequencies of MLE lexis in the North of England (e.g., Manchester), and in Scotland and Wales. Concluding, we emphasise the role of demographic similarity in the diffusion of linguistic innovations by demonstrating that this variety originated in London and diffused into other urban areas in England through the social networks of Black and Asian users

    Noun phrase modification

    Get PDF

    The application of growth curve modeling for the analysis of diachronic corpora

    Get PDF
    This paper introduces growth curve modeling for the analysis of language change in corpus linguistics. In addition to describing growth curve modeling, which is a regression-based method for studying the dynamics of a set of variables measured over time, we demonstrate the technique through an analysis of the relative frequencies of words that are increasing or decreasing over time in a multi-billion word diachronic corpus of Twitter. This analysis finds that increasing words tend to follow a trajectory similar to the s-curve of language change, whereas decreasing words tend to follow a decelerated trajectory, thereby showing how growth curve modeling can be used to uncover and describe underlying patterns of language change in diachronic corpora

    Dimensions of Abusive Language on Twitter

    Get PDF
    In this paper, we use a new categorical form of multidimensional register analysis to identify the main dimensions of functional linguistic variation in a corpus of abusive language, consisting of racist and sexist Tweets. By analysing the use of a wide variety of parts-of-speech and grammatical constructions, as well as various features related to Twitter and computer-mediated communication, we discover three dimensions of linguistic variation in this corpus, which we interpret as being related to the degree of interactive, antagonistic and attitudinal language exhibited by individual Tweets. We then demonstrate that there is a significant functional difference between racist and sexist Tweets, with sexists Tweets tending to be more interactive and attitudinal than racist Tweets
    corecore