8 research outputs found

    Анализ некорректной работы POS-разметчиков в корпусе немецких ученических текстов с лингвистическими ошибками

    Get PDF
    Целью исследования является анализ степени влияния разного рода ошибок в неаутентичных текстах на результаты работы автоматического частеречного разметчик

    How character limit affects language usage in tweets

    Get PDF
    In November 2017 Twitter doubled the available character space from 140 to 280 characters. This provided an opportunity for researchers to investigate the linguistic effects of length constraints in online communication. We asked whether the character limit change (CLC) affected language usage in Dutch tweets and hypothesized that there would be a reduction in the need for character-conserving writing styles. Pre-CLC tweets were compared with post-CLC tweets. Three separate analyses were performed: (I) general analysis: the number of characters, words, and sentences per tweet, as well as the average word and sentence length. (II) Token analysis: the relative frequency of tokens and bigrams; (III) part-of-speech analysis: the grammatical structure of the sentences in tweets (i.e., adjectives, adverbs, articles, conjunctives, interjections, nouns, prepositions, pronouns, and verbs); pre-CLC tweets showed relatively more textisms, which are used to abbreviate and conserve character space. Consequently, they represent more informal language usage (e.g., internet slang); in turn, post-CLC tweets contained relatively more articles, conjunctions, and prepositions. The results show that online language producers adapt their texts to overcome limit constraints

    Versification and Authorship Attribution

    Get PDF
    The technique known as contemporary stylometry uses different methods, including machine learning, to discover a poem’s author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint stylometry tends to ignore: versification, or the very making of language into verse. Using poetic texts in three different languages (Czech, German, and Spanish), Petr Plecháč asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. He then tests its findings on two unsolved literary mysteries. In the first, Plecháč distinguishes the parts of the Elizabethan verse play The Two Noble Kinsmen written by William Shakespeare from those written by his coauthor, John Fletcher. In the second, he seeks to solve a case of suspected forgery: how authentic was a group of poems first published as the work of the nineteenth-century Russian author Gavriil Stepanovich Batenkov? This book of poetic investigation should appeal to literary sleuths the world over.illustrato

    Traceability Links Recovery among Requirements and BPMN models

    Full text link
    Tesis por compendio[EN] Throughout the pages of this document, I present the results of the research that was carried out in the context of my PhD studies. During the aforementioned research, I studied the process of Traceability Links Recovery between natural language requirements and industrial software models. More precisely, due to their popularity and extensive usage, I studied the process of Traceability Links Recovery between natural language requirements and Business Process Models, also known as BPMN models. In order to carry out the research, I focused my work on two main objectives: (1) the development of the Traceability Links Recovery techniques between natural language requirements and BPMN models, and (2) the validation and analysis of the results obtained by the developed techniques in industrial domain case studies. The results of the research have been redacted and published in forums, conferences, and journals specialized in the topics and context of the research. This thesis document introduces the topics, context, and objectives of the research, presents the academic publications that have been published as a result of the work, and then discusses the outcomes of the investigation.[ES] A través de las páginas de este documento, presento los resultados de la investigación realizada en el contexto de mis estudios de doctorado. Durante la investigación, he estudiado el proceso de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y modelos de software industriales. Más concretamente, debido a su popularidad y uso extensivo, he estudiado el proceso de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y Modelos de Procesos de Negocio, también conocidos como modelos BPMN. Para llevar a cabo esta investigación, mi trabajo se ha centrado en dos objetivos principales: (1) desarrollo de técnicas de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y modelos BPMN, y (2) validación y análisis de los resultados obtenidos por las técnicas desarrolladas en casos de estudio de dominios industriales. Los resultados de la investigación han sido redactados y publicados en foros, conferencias y revistas especializadas en los temas y contexto de la investigación. Esta tesis introduce los temas, contexto y objetivos de la investigación, presenta las publicaciones académicas que han sido publicadas como resultado del trabajo, y expone los resultados de la investigación.[CA] A través de les pàgines d'aquest document, presente els resultats de la investigació realitzada en el context dels meus estudis de doctorat. Durant la investigació, he estudiat el procés de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i models de programari industrials. Més concretament, a causa de la seua popularitat i ús extensiu, he estudiat el procés de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i Models de Processos de Negoci, també coneguts com a models BPMN. Per a dur a terme aquesta investigació, el meu treball s'ha centrat en dos objectius principals: (1) desenvolupament de tècniques de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i models BPMN, i (2) validació i anàlisi dels resultats obtinguts per les tècniques desenvolupades en casos d'estudi de dominis industrials. Els resultats de la investigació han sigut redactats i publicats en fòrums, conferències i revistes especialitzades en els temes i context de la investigació. Aquesta tesi introdueix els temes, context i objectius de la investigació, presenta les publicacions acadèmiques que han sigut publicades com a resultat del treball, i exposa els resultats de la investigació.Lapeña Martí, R. (2020). Traceability Links Recovery among Requirements and BPMN models [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/149391TESISCompendi

    Rational Creatures: Using Vector Space Models to Examine Independence in the Novels of Jane Austen, Maria Edgeworth, and Sydney Owenson (1800–1820)

    Get PDF
    Recent trends in digital humanities have led to a proliferation of studies that apply ‘distant’ reading to textual data. There is an uneasy relationship between the increased use of computational methods and their application to literary studies. Much of the current literature has focused on the exploration of large corpora. However, the ability to work at this scale is often not within the power (financial or technical) or the interests, of researchers. As these large-scale studies often ignore smaller corpora, few have sought to define a clear theoretical framework within which to study small-scale text collections. In addition, while some research has been carried out on the application of term-document vector space models (topic models and frequency based analysis) to nineteenth century novels, no study exists which applies word-context models (word embeddings and semantic networks) to the novels of Austen, Edgeworth, and Owenson. This study, therefore, seeks to evaluate the use of vector space models when applied to these novels. This research first defines a theoretical framework - enhanced reading - which combines the use of close and distant reading. Using a corpus of twenty-eight nineteenth century novels as its central focus, this study also demonstrates the practical application of this theoretical approach with the additional aim of providing an insight into the authors’ representation of independence at a time of great political and social upheaval in Ireland and the UK. The use of term-document models was found to be, generally, more useful for gaining an overview of the corpora. However, the findings for word-context models reveal their ability to identify specific textual elements, some of which were not readily identified through close reading, and therefore were useful for exploring texts at both corpus and individual text level

    Rational Creatures: Using Vector Space Models to Examine Independence in the Novels of Jane Austen, Maria Edgeworth, and Sydney Owenson (1800–1820)

    Get PDF
    Recent trends in digital humanities have led to a proliferation of studies that apply ‘distant’ reading to textual data. There is an uneasy relationship between the increased use of computational methods and their application to literary studies. Much of the current literature has focused on the exploration of large corpora. However, the ability to work at this scale is often not within the power (financial or technical) or the interests, of researchers. As these large-scale studies often ignore smaller corpora, few have sought to define a clear theoretical framework within which to study small-scale text collections. In addition, while some research has been carried out on the application of term-document vector space models (topic models and frequency based analysis) to nineteenth century novels, no study exists which applies word-context models (word embeddings and semantic networks) to the novels of Austen, Edgeworth, and Owenson. This study, therefore, seeks to evaluate the use of vector space models when applied to these novels. This research first defines a theoretical framework - enhanced reading - which combines the use of close and distant reading. Using a corpus of twenty-eight nineteenth century novels as its central focus, this study also demonstrates the practical application of this theoretical approach with the additional aim of providing an insight into the authors’ representation of independence at a time of great political and social upheaval in Ireland and the UK. The use of term-document models was found to be, generally, more useful for gaining an overview of the corpora. However, the findings for word-context models reveal their ability to identify specific textual elements, some of which were not readily identified through close reading, and therefore were useful for exploring texts at both corpus and individual text level

    Authorship Attribution of Poetic Texts

    Get PDF
    Název práce: Atribuce autorství básnických textů Autor: Mgr. Petr Plecháč, Ph.D. Katedra: Ústav českého národního korpusu Školitel: doc. Mgr. Václav Cvrček, Ph.D. ABSTRAKT Pro rozpoznávání autorství básnických textů nabízí současná stylometrie řadu metod za- ložených na analýze pestré škály textových rysů (např. frekvence slov, frekvence zna- kových n-gramů). Jeden podstatný aspekt těchto textů ovšem zůstává stranou, a to jejich stránka versologická. Tato práce proto na čtyřech korpusech básnických textů (českých, německých, španělských a anglických) analyzuje, do jaké míry lze versologické charakte- ristiky - jako např. četnosti rytmických konfigurací nebo četnosti různých typů rýmů - využít jako indikátor autorství básnického textu. Ukazujeme, že (1) úspěšnost versolo- gických modelů vysoce převyšuje hranici random baseline, (2) ojediněle převyšuje úspěšnost obvyklých lexikálních modelů a (3) kombinované versologicko-lexikální mode- ly vykazují téměř vždy vyšší úspěšnost než jednotlivé modely samy o sobě. V další části práce jsou versologické rysy využity pro atribuci dvou textů se sporným autorstvím: (1) veršované drama The Famous History of the Life of King Henry the Eigth poprvé otištěné pod jménem Williama Shakespeara, u nějž se ovšem před-pokládá i autorská účast Johna Fletchera, příp. dalších autorů...Title: Authorship Attribution of Poetic Texts Author: Mgr. Petr Plecháč, Ph.D. Department: Institute of Czech National Corpus Supervisor: doc. Mgr. Václav Cvrček, Ph.D. ABSTRACT Contemporary stylometry offers a number of methods for authorship recognition of po- etic texts based on a variety of textual features (e.g. word frequencies, frequencies of character n-grams). However, it seems that one important aspect of these texts has been rather left aside - this aspect is versification. The thesis uses four corpora of poetic texts (Czech, German, Spanish, and English) in order to analyze to what extent versification features - such as frequencies of rhythmic patterns or frequencies of various types of rhymes - may be used as an indicator of authorship. We show that (1) versification-based models significantly outperform the random baseline, (2) in some cases versification- based models even outperform the traditionally used lexical models, (3) in most of the cases combination of both types of models outperforms the given models alone. Versifi- cation features are consequently employed for the purpose of attribution of two texts of doubted authorship: (1) the versified play The Famous History of the Life of King Henry the Eigth which was originally published under the name of William Shakespeare, but where...Ústav českého národního korpusuInstitute of the Czech National CorpusFilozofická fakultaFaculty of Art

    Talking about personal recovery in bipolar disorder: Integrating health research, natural language processing, and corpus linguistics to analyse peer online support forum posts

    Get PDF
    Background: Personal recovery, ‘living a satisfying, hopeful and contributing lifeeven with the limitations caused by the illness’ (Anthony, 1993) is of particular value in bipolar disorder where symptoms often persist despite treatment. So far, personal recovery has only been studied in researcher-constructed environments (interviews, focus groups). Support forum posts can serve as a complementary naturalistic data source. Objective: The overarching aim of this thesis was to study personal recovery experiences that people living with bipolar disorder have shared in online support forums through integrating health research, NLP, and corpus linguistics in a mixed methods approach within a pragmatic research paradigm, while considering ethical issues and involving people with lived experience. Methods: This mixed-methods study analysed: 1) previous qualitative evidence on personal recovery in bipolar disorder from interviews and focus groups 2) who self-reports a bipolar disorder diagnosis on the online discussion platform Reddit 3) the relationship of mood and posting in mental health-specific Reddit forums (subreddits) 4) discussions of personal recovery in bipolar disorder subreddits. Results: A systematic review of qualitative evidence resulted in the first framework for personal recovery in bipolar disorder, POETIC (Purpose & meaning, Optimism & hope, Empowerment, Tensions, Identity, Connectedness). Mainly young or middle-aged US-based adults self-report a bipolar disorder diagnosis on Reddit. Of these, those experiencing more intense emotions appear to be more likely to post in mental health support subreddits. Their personal recovery-related discussions in bipolar disorder subreddits primarily focussed on three domains: Purpose & meaning (particularly reproductive decisions, work), Connectedness (romantic relationships, social support), Empowerment (self-management, personal responsibility). Support forum data highlighted personal recovery issues that exclusively or more frequently came up online compared to previous evidence from interviews and focus groups. Conclusion: This project is the first to analyse non-reactive data on personal recovery in bipolar disorder. Indicating the key areas that people focus on in personal recovery when posting freely and the language they use provides a helpful starting point for formal and informal carers to understand the concerns of people diagnosed with bipolar disorder and to consider how best to offer support
    corecore