868 research outputs found

    Using Natural Language Processing to Categorize Fictional Literature in an Unsupervised Manner

    Get PDF
    When following a plot in a story, categorization is something that humans do without even thinking; whether this is simple classification like “This is science fiction” or more complex trope recognition like recognizing a Chekhov\u27s gun or a rags to riches storyline, humans group stories with other similar stories. Research has been done to categorize basic plots and acknowledge common story tropes on the literary side, however, there is not a formula or set way to determine these plots in a story line automatically. This paper explores multiple natural language processing techniques in an attempt to automatically compare and cluster a fictional story into categories in an unsupervised manner. The aim is to mimic how a human may look deeper into a plot, find similar concepts like certain words being used, the types of words being used, for example an adventure book may have more verbs, as well as the sentiment of the sentences in order to group books into similar clusters

    Predicting the age of social network users from user-generated texts with word embeddings

    Get PDF
    © 2016 FRUCT.Many web-based applications such as advertising or recommender systems often critically depend on the demographic information, which may be unavailable for new or anonymous users. We study the problem of predicting demographic information based on user-generated texts on a Russian-language dataset from a large social network. We evaluate the efficiency of age prediction algorithms based on word2vec word embeddings and conduct a comprehensive experimental evaluation, comparing these algorithms with each other and with classical baseline approaches

    Cognitive Grammar in Literature

    Get PDF
    This is the first book to present an account of literary meaning and effects drawing on our best understanding of mind and language in the form of a Cognitive Grammar. The contributors provide exemplary analyses of a range of literature from science fiction, dystopia, absurdism and graphic novels to the poetry of Wordsworth, Hopkins, Sassoon, Balassi, and Dylan Thomas, as well as Shakespeare, Chaucer, Barrett Browning, Whitman, Owen and others. The application of Cognitive Grammar allows the discussion of meaning, translation, ambience, action, reflection, multimodality, empathy, experience and literariness itself to be conducted in newly valid ways. With a Foreword by the creator of Cognitive Grammar, Ronald Langacker, and an Afterword by the cognitive scientist Todd Oakley, the book represents the latest advance in literary linguistics, cognitive poetics and literary critical practice

    Anonymity and Imitation in Linguistic Identity Disguise

    Get PDF
    Authorship attribution can be highly accurate, but most techniques are based on the assumption that authors have not attempted to disguise their writing style. Research has found that when writers had deliberately altered their style, commonly used authorship analysis techniques only performed at the level of random chance. This is problematic because many forensic authorship cases investigate documents where it is believed that an author has tried to impersonate somebody else for criminal purposes, and has attempted to adapt their writing style to do so. This study uses a corpus of scripts from the BBC drama, The Archers, to explore how authors write different characters’ voices. Scriptwriters need to adapt their writing style to create the different characters’ dialogues, and this fictional identity disguise is used as a proxy to examine authorship analysis techniques in forensic linguistics. The thesis begins with a literature review exploring the nature of linguistic identity and literary characterisation. It considers the advantages and disadvantages of using fictional data to address forensic problems. There are three main studies: firstly, a quantitative analysis comparing inter-author consistency and variation of authorship analysis features; the second study is a qualitative, stylistic analysis of characterisation, exploring lexical choice, use of dialect, and (im)politeness strategies. The third study is a corpus analysis of the different pragmatic functions of shared lexical tokens. The studies showed that as writers adapted their linguistic style to create different characters, results for commonly-used attribution techniques were observably affected. Some linguistic identities were more distinctive than others, and some authors were more clearly identifiable than others. At a pragmatic level, authors showed more inter-character consistency, and a reduced ability to anonymise their own linguistic traits. This reinforces the importance of investigating linguistic identity disguise at higher levels of language analysis, in addition to lower-level, structural features

    Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings

    Full text link
    In this research, we manually create high-quality datasets in the digital humanities domain for the evaluation of language models, specifically word embedding models. The first step comprises the creation of unigram and n-gram datasets for two fantasy novel book series for two task types each, analogy and doesn't-match. This is followed by the training of models on the two book series with various popular word embedding model types such as word2vec, GloVe, fastText, or LexVec. Finally, we evaluate the suitability of word embedding models for such specific relation extraction tasks in a situation of comparably small corpus sizes. In the evaluations, we also investigate and analyze particular aspects such as the impact of corpus term frequencies and task difficulty on accuracy. The datasets, and the underlying system and word embedding models are available on github and can be easily extended with new datasets and tasks, be used to reproduce the presented results, or be transferred to other domains

    Theatrical Genre Prediction Using Social Network Metrics

    Get PDF
    With the emergence of digitization, large text corpora are now available online that provide humanities scholars an opportunity to perform literary analysis leveraging the use of computational techniques. This work is focused on applying network theory concepts in the field of literature to explore correlations between the mathematical properties of the social networks of plays and the plays’ dramatic genre, specifically how well social network metrics can identify genre without taking vocabulary into consideration. Almost no work has been done to study the ability of mathematical properties of network graphs to predict literary features. We generated character interaction networks of 36 Shakespeare plays and tried to differentiate plays based on social network features captured by the character network of each play. We were able to successfully predict the genre of Shakespeare’s plays with the help of social network metrics and hence establish that differences of dramatic genre are successfully captured by the local and global social network metrics of the plays. Since the technique is highly extensible, future work can be extended for fast and detailed literary analysis of larger groups of plays, including plays written in different languages as well as plays written by different authors
    • …
    corecore