123 research outputs found

    Register variation explains stylometric authorship analysis

    Get PDF

    A statistical comparison of regional phonetic and lexical variation in American English

    Get PDF
    This paper presents a statistical comparison of regional phonetic and lexical variation in American English. Both the phonetic and lexical datasets were first subjected to separate multivariate spatial analyses in order to identify the most common dimensions of spatial clustering in these two datasets. The dimensions of phonetic and lexical variation extracted by these two analyses were then correlated with each other, after being interpolated over a shared set of reference locations, in order to measure the similarity of regional phonetic and lexical variation in American English. This analysis shows that regional phonetic and lexical variation are remarkably similar in Modern American English

    The semantics, sociolinguistics, and origins of double modals in American English:New insights from social media

    Get PDF
    In this paper, we analyze double modal use in American English based on a multi-billion-word corpus of geolocated posts from the social media platform Twitter. We identify and map 76 distinct double modals totaling 5,349 examples, many more types and tokens of double modals than have ever been observed. These descriptive results show that double modal structure and use in American English is far more complex than has generally been assumed. We then consider the relevance of these results to three current theoretical debates. First, we demonstrate that although there are various semantic tendencies in the types of modals that most often combine, there are no absolute constraints on double modal formation in American English. Most surprisingly, our results suggest that double modals are used productively across the US. Second, we argue that there is considerable dialect variation in double modal use in the southern US, with double modals generally being most strongly associated with African American Language, especially in the Deep South. This result challenges previous sociolinguistic research, which has often highlighted double modal use in White Southern English, especially in Appalachia. Third, we consider how these results can help us better understand the origins of double modals in America English: although it has generally been assumed that double modals were introduced by Scots-Irish settlers, we believe our results are more consistent with the hypothesis that double modals are an innovation of African American Language.</p

    The Language of Fake News

    Get PDF

    Stylistic variation on the Donald Trump Twitter account:a linguistic analysis of tweets posted between 2009 and 2018

    Get PDF
    Twitter was an integral part of Donald Trump's communication platform during his 2016 campaign. Although its topical content has been examined by researchers and the media, we know relatively little about the style of the language used on the account or how this style changed over time. In this study, we present the first detailed description of stylistic variation on the Trump Twitter account based on a multivariate analysis of grammatical co-occurrence patterns in tweets posted between 2009 and 2018. We identify four general patterns of stylistic variation, which we interpret as representing the degree of conversational, campaigning, engaged, and advisory discourse. We then track how the use of these four styles changed over time, focusing on the period around the campaign, showing that the style of tweets shifts systematically depending on the communicative goals of Trump and his team. Based on these results, we propose a series of hypotheses about how the Trump campaign used social media during the 2016 elections

    A statistical method for the identification and aggregation of regional linguistic variation

    Get PDF
    This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States

    Using social media to infer the diffusion of an urban contact dialect:A case study of multicultural London English

    Get PDF
    Sociolinguistic research has demonstrated that ‘urban contact dialects’ tend to diffuse beyond the speech communities in which they first emerge. However, no research has attempted to explore the distribution of these varieties across an entire nation nor isolate the social mechanisms that propel their spread. In this paper, we use a corpus of 1.8 billion geo-tagged tweets to explore the spread of Multicultural London English lexis across the UK. We find evidence for the diffusion of MLE lexis from East and North London into other ethnically and culturally diverse urban centres across England particularly those in the South (e.g., Luton), but find lower frequencies of MLE lexis in the North of England (e.g., Manchester), and in Scotland and Wales. Concluding, we emphasise the role of demographic similarity in the diffusion of linguistic innovations by demonstrating that this variety originated in London and diffused into other urban areas in England through the social networks of Black and Asian users

    Noun phrase modification

    Get PDF

    The application of growth curve modeling for the analysis of diachronic corpora

    Get PDF
    This paper introduces growth curve modeling for the analysis of language change in corpus linguistics. In addition to describing growth curve modeling, which is a regression-based method for studying the dynamics of a set of variables measured over time, we demonstrate the technique through an analysis of the relative frequencies of words that are increasing or decreasing over time in a multi-billion word diachronic corpus of Twitter. This analysis finds that increasing words tend to follow a trajectory similar to the s-curve of language change, whereas decreasing words tend to follow a decelerated trajectory, thereby showing how growth curve modeling can be used to uncover and describe underlying patterns of language change in diachronic corpora

    Geographic structure of Chinese dialects: a computational dialectometric approach

    Get PDF
    Dialect classification is a long-standing issue in Chinese dialectology. Although various theories of Chinese dialect regions have been proposed, most have been limited by similar methodological issues, especially due to their reliance on the subjective analysis of dialect maps both individually and in the aggregate, as well as their focus on phonology over syntax and vocabulary. Consequently, we know relatively little about the geolinguistic underpinnings of Chinese dialect variation. Following a review of previous research in this area, this article presents a theory of Chinese dialect regions based on the first large-scale quantitative analysis of the data from the Linguistic Atlas of Chinese Dialects, which was collected between 2000 and 2008, providing the most up-to-date picture of the full Chinese dialect landscape. We identify and map a hierarchy of 10 major Chinese dialect regions, challenging traditional accounts. In addition, we propose a new theory of Chinese dialect formation to account for our findings
    • …
    corecore