127 research outputs found

    Register variation in malicious forensic texts

    Get PDF
    The study reported here examines a corpus of 104 authentic malicious forensic texts for register variation. A malicious forensic text is defined in this paper as a text that is threatening, abusive or defaming and that constitutes evidence for a forensic case. This corpus was firstly tagged with a set of situational parameters and then analysed using the same multidimensional model introduced in Biber (1988; 1989). The results of the study indicate that malicious forensic texts, similarly to non-malicious professional letters, are on average instances of the Involved Persuasion text type, which is characterised by linguistic features overtly expressing modality. The results also confirm that threatening texts tend to use more modal verbs than non-threatening texts. Furthermore, the personal knowledge between interactants was found to highly influence the level of information density of the texts, while the narrativity level of malicious texts was found to be affected by whether the text contains harmful content directed to the addressee or to a third party. These findings can inform and improve the authorship analysis of malicious texts and increase our understanding of the creation of language crimes.</jats:p

    Authorship profiling in a forensic context

    Get PDF
    There are several unresolved problems in forensic authorship profiling, including a lack of research focusing on the types of texts that are typically analysed in forensic linguistics (e.g. threatening letters, ransom demands) and a general disregard for the effect of register variation when testing linguistic variables for use in profiling. The aim of this dissertation is therefore to make a first step towards filling these gaps by testing whether established patterns of sociolinguistic variation appear in malicious forensic texts that are controlled for register. This dissertation begins with a literature review that highlights a series of correlations between language use and various social factors, including gender, age, level of education and social class. This dissertation then presents the primary data set used in this study, which consists of a corpus of 287 fabricated malicious texts from 3 different registers produced by 96 authors stratified across the 4 social factors listed above. Since this data set is fabricated, its validity was also tested through a comparison with another corpus consisting of 104 naturally occurring malicious texts, which showed that no important differences exist between the language of the fabricated malicious texts and the authentic malicious texts. The dissertation then reports the findings of the analysis of the corpus of fabricated malicious texts, which shows that the major patterns of sociolinguistic variation identified in previous research are valid for forensic malicious texts and that controlling register variation greatly improves the performance of profiling. In addition, it is shown that through regression analysis it is possible to use these patterns of linguistic variation to profile the demographic background of authors across the four social factors with an average accuracy of 70%. Overall, the present study therefore makes a first step towards developing a principled model of forensic authorship profiling

    Register Variation Remains Stable Across 60 Languages

    Full text link
    This paper measures the stability of cross-linguistic register variation. A register is a variety of a language that is associated with extra-linguistic context. The relationship between a register and its context is functional: the linguistic features that make up a register are motivated by the needs and constraints of the communicative situation. This view hypothesizes that register should be universal, so that we expect a stable relationship between the extra-linguistic context that defines a register and the sets of linguistic features which the register contains. In this paper, the universality and robustness of register variation is tested by comparing variation within vs. between register-specific corpora in 60 languages using corpora produced in comparable communicative situations: tweets and Wikipedia articles. Our findings confirm the prediction that register variation is, in fact, universal

    The application of growth curve modeling for the analysis of diachronic corpora

    Get PDF
    This paper introduces growth curve modeling for the analysis of language change in corpus linguistics. In addition to describing growth curve modeling, which is a regression-based method for studying the dynamics of a set of variables measured over time, we demonstrate the technique through an analysis of the relative frequencies of words that are increasing or decreasing over time in a multi-billion word diachronic corpus of Twitter. This analysis finds that increasing words tend to follow a trajectory similar to the s-curve of language change, whereas decreasing words tend to follow a decelerated trajectory, thereby showing how growth curve modeling can be used to uncover and describe underlying patterns of language change in diachronic corpora
    • …