19 research outputs found

    Application of psycholinguistic features to authorship profiling for first language, gender and age group

    Get PDF
    Much of the fraud committed in cyberspace involves the misrepresentation of the demographic data of the perpetrator via the medium of seemly anonymous text messages. One way to address this issue is to apply techniques from the field of authorship characterisation or profiling which is the analysis of text to determine the demographic profile of the author. Most of the previous research into authorship characterisation has used counts and ratios of lexicographically based features that include words, parts of words and Parts Of Speech (POS) contained within the text. This study examines the effectiveness of classifying the first language, gender and age group of an author using a set of features developed in the psycholinguistic field (the Linguistic Inquiry and Word Count - LIWC), both as a single type feature set and in combination with the lexicographically based features used in previous studies (function words, character bigrams and POS unigrams and bigrams). This study also searched for the smallest, most effective subset of each feature set that was practical, by ranking the features using three feature selection algorithms and systematically reducing the number used. In addition, the study explored the effective lower word limit for accurate classification by reducing the text size by regular increments. LIWC was found to be more effective than a similar number of any of the lexicographic feature types, and to add insight rather than noise when combined with these feature types. This held to be true for both the full and reduced text sizes for all three demographic classes examined. In addition it was found that the size of feature sets could be greatly reduced while still maintaining effective levels of classification accuracy.Doctor of Philosoph

    To cut a long story short:an analysis of formulaic sequences in short written narratives and their potential as markers of authorship

    Get PDF
    Previous research into formulaic language has focussed on specialised groups of people (e.g. L1 acquisition by infants and adult L2 acquisition) with ordinary adult native speakers of English receiving less attention. Additionally, whilst some features of formulaic language have been used as evidence of authorship (e.g. the Unabomber’s use of you can’t eat your cake and have it too) there has been no systematic investigation into this as a potential marker of authorship. This thesis reports the first full-scale study into the use of formulaic sequences by individual authors. The theory of formulaic language hypothesises that formulaic sequences contained in the mental lexicon are shaped by experience combined with what each individual has found to be communicatively effective. Each author’s repertoire of formulaic sequences should therefore differ. To test this assertion, three automated approaches to the identification of formulaic sequences are tested on a specially constructed corpus containing 100 short narratives. The first approach explores a limited subset of formulaic sequences using recurrence across a series of texts as the criterion for identification. The second approach focuses on a word which frequently occurs as part of formulaic sequences and also investigates alternative non-formulaic realisations of the same semantic content. Finally, a reference list approach is used. Whilst claiming authority for any reference list can be difficult, the proposed method utilises internet examples derived from lists prepared by others, a procedure which, it is argued, is akin to asking large groups of judges to reach consensus about what is formulaic. The empirical evidence supports the notion that formulaic sequences have potential as a marker of authorship since in some cases a Questioned Document was correctly attributed. Although this marker of authorship is not universally applicable, it does promise to become a viable new tool in the forensic linguist’s tool-kit

    To cut a long story short : an analysis of formulaic sequences in short written narratives and their potential as markers of authorship

    Get PDF
    Previous research into formulaic language has focussed on specialised groups of people (e.g. L1 acquisition by infants and adult L2 acquisition) with ordinary adult native speakers of English receiving less attention. Additionally, whilst some features of formulaic language have been used as evidence of authorship (e.g. the Unabomber’s use of you can’t eat your cake and have it too) there has been no systematic investigation into this as a potential marker of authorship. This thesis reports the first full-scale study into the use of formulaic sequences by individual authors. The theory of formulaic language hypothesises that formulaic sequences contained in the mental lexicon are shaped by experience combined with what each individual has found to be communicatively effective. Each author’s repertoire of formulaic sequences should therefore differ. To test this assertion, three automated approaches to the identification of formulaic sequences are tested on a specially constructed corpus containing 100 short narratives. The first approach explores a limited subset of formulaic sequences using recurrence across a series of texts as the criterion for identification. The second approach focuses on a word which frequently occurs as part of formulaic sequences and also investigates alternative non-formulaic realisations of the same semantic content. Finally, a reference list approach is used. Whilst claiming authority for any reference list can be difficult, the proposed method utilises internet examples derived from lists prepared by others, a procedure which, it is argued, is akin to asking large groups of judges to reach consensus about what is formulaic. The empirical evidence supports the notion that formulaic sequences have potential as a marker of authorship since in some cases a Questioned Document was correctly attributed. Although this marker of authorship is not universally applicable, it does promise to become a viable new tool in the forensic linguist’s tool-kit.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Attentive Speaking. From Listener Feedback to Interactive Adaptation

    Get PDF
    Buschmeier H. Attentive Speaking. From Listener Feedback to Interactive Adaptation. Bielefeld: Universität Bielefeld; 2018.Dialogue is an interactive endeavour in which participants jointly pursue the goal of reaching understanding. Since participants enter the interaction with their individual conceptualisation of the world and their idiosyncratic way of using language, understanding cannot, in general, be reached by exchanging messages that are encoded when speaking and decoded when listening. Instead, speakers need to design their communicative acts in such a way that listeners are likely able to infer what is meant. Listeners, in turn, need to provide evidence of their understanding in such a way that speakers can infer whether their communicative acts were successful. This is often an interactive and iterative process in which speakers and listeners work towards understanding by jointly coordinating their communicative acts through feedback and adaptation. Taking part in this interactive process requires dialogue participants to have ‘interactional intelligence’. This conceptualisation of dialogue is rather uncommon in formal or technical approaches to dialogue modelling. This thesis argues that it may, nevertheless, be a promising research direction for these fields, because it de-emphasises raw language processing performance and focusses on fundamental interaction skills. Interactionally intelligent artificial conversational agents may thus be able to reach understanding with their interlocutors by drawing upon such competences. This will likely make them more robust, more understandable, more helpful, more effective, and more human-like. This thesis develops conceptual and computational models of interactional intelligence for artificial conversational agents that are limited to (1) the speaking role, and (2) evidence of understanding in form of communicative listener feedback (short but expressive verbal/vocal signals, such as ‘okay’, ‘mhm’ and ‘huh’, head gestures, and gaze). This thesis argues that such ‘attentive speaker agents’ need to be able (1) to probabilistically reason about, infer, and represent their interlocutors’ listening related mental states (e.g., their degree of understanding), based on their interlocutors’ feedback behaviour; (2) to interactively adapt their language and behaviour such that their interlocutors’ needs, derived from the attributed mental states, are taken into account; and (3) to decide when they need feedback from their interlocutors and how they can elicit it using behavioural cues.This thesis describes computational models for these three processes, their integration in an incremental behaviour generation architecture for embodied conversational agents, and a semi-autonomous interaction study in which the resulting attentive speaker agent is evaluated. The evaluation finds that the computational models of attentive speaking developed in this thesis enable conversational agents to interactively reach understanding with their human interlocutors (through feedback and adaptation) and that these interlocutors are willing to provide natural communicative listener feedback to such an attentive speaker agent. The thesis shows that computationally modelling interactional intelligence is generally feasible, and thereby raises many new research questions and engineering problems in the interdisciplinary fields of dialogue and artificial conversational agents

    Rhyme and Rhyming in Verbal Art, Language, and Song

    Get PDF
    This collection of thirteen chapters answers new questions about rhyme, with views from folklore, ethnopoetics, the history of literature, literary criticism and music criticism, psychology and linguistics. The book examines rhyme as practiced or as understood in English, Old English and Old Norse, German, Swedish, Norwegian, Finnish and Karelian, Estonian, Medieval Latin, Arabic, and the Central Australian language Kaytetye. Some authors examine written poetry, including modernist poetry, and others focus on various kinds of sung poetry, including rap, which now has a pioneering role in taking rhyme into new traditions. Some authors consider the relation of rhyme to other types of form, notably alliteration. An introductory chapter discusses approaches to rhyme, and ends with a list of languages whose literatures or song traditions are known to have rhyme

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Rhyme and Rhyming in verbal Art, Language, and Song

    Get PDF
    This collection of thirteen chapters answers new questions about rhyme, with views from folklore, ethnopoetics, the history of literature, literary criticism and music criticism, psychology and linguistics. The book examines rhyme as practiced or as understood in English, Old English and Old Norse, German, Swedish, Norwegian, Finnish and Karelian, Estonian, Medieval Latin, Arabic, and the Central Australian language Kaytetye. Some authors examine written poetry, including modernist poetry, and others focus on various kinds of sung poetry, including rap, which now has a pioneering role in taking rhyme into new traditions. Some authors consider the relation of rhyme to other types of form, notably alliteration. An introductory chapter discusses approaches to rhyme, and ends with a list of languages whose literatures or song traditions are known to have rhyme.Peer reviewe

    Coordinating in dialogue: Using compound contributions to join a party

    Get PDF
    PhDCompound contributions (CCs) – dialogue contributions that continue or complete an earlier contribution – are an important and common device conversational participants use to extend their own and each other’s turns. The organisation of these cross-turn structures is one of the defining characteristics of natural dialogue, and cross-person CCs provide the paradigm case of coordination in dialogue. This thesis combines corpus analysis, experiments and theoretical modelling to explore how CCs are used, their effects on coordination and implications for dialogue models. The syntactic and pragmatic distribution of CCs is mapped using corpora of ordinary and task-oriented dialogues. This indicates that the principal factors conditioning the distribution of CCs are pragmatic and that same- and cross-person CCs tend to occur in different contexts. In order to test the impact of CCs on other conversational participants, two experiments are presented. These systematically manipulate, for the first time, the occurrence of CCs in live dialogue using text-based communication. The results suggest that syntax does not directly constrain the interpretation of CCs, and the primary effect of a cross-person CC on third parties is to suggest to them a strong form of coordination or coalition has formed between the people producing the two parts of the CC. A third experiment explores the conditions under which people will produce a completion for a truncated turn. Manipulations of the structural and contextual predictability of the truncated turn show that while syntax provides a resource for the construction of a CC it does not place significant constraints on where the split point may occur. It also shows that people are more likely to produce continuations when they share common ground. An analysis using the Dynamic Syntax framework is proposed, which extends previous work to account for these findings, and limitations and further research possibilities are outlined

    Rhyme and Rhyming in Verbal Art, Language, and Song

    Get PDF
    This collection of thirteen chapters answers new questions about rhyme, with views from folklore, ethnopoetics, the history of literature, literary criticism and music criticism, psychology and linguistics. The book examines rhyme as practiced or as understood in English, Old English and Old Norse, German, Swedish, Norwegian, Finnish and Karelian, Estonian, Medieval Latin, Arabic, and the Central Australian language Kaytetye. Some authors examine written poetry, including modernist poetry, and others focus on various kinds of sung poetry, including rap, which now has a pioneering role in taking rhyme into new traditions. Some authors consider the relation of rhyme to other types of form, notably alliteration. An introductory chapter discusses approaches to rhyme, and ends with a list of languages whose literatures or song traditions are known to have rhyme

    Multilingualism across the Lifespan

    Get PDF
    This innovative collection examines key questions on language diversity and multilingualism running through contemporary debates in psycholinguistics and sociolinguistics. Reinforcing interdisciplinary conversations on these themes, each chapter is co-authored by two different researchers, often those who have not written together before. The combined effect is a volume showcasing unique and dynamic perspectives on such topics as multilingualism across the lifespan, bilingual acquisition, family language policy, language and ageing, language shift, language and identity, and multilingualism and language impairment. The book builds on Elizabeth Lanza’s pioneering work on multilingualism across the lifespan, bringing together cutting-edge research exploring multilingualism as an evolving phenomenon at landmarks in individuals’, families’, and communities’ lives. Taken together, the book offers a rich portrait of the different facets of multilingualism as a lived reality for individuals, families, and communities. This ground-breaking volume will be of particular interest to students and scholars in multilingualism, applied linguistics, sociolinguistics, and psycholinguistics
    corecore