949 research outputs found

    Profiling a set of personality traits of text author: what our words reveal about us

    Get PDF
    Authorship profiling, i.e. revealing information about an unknown author by analyzing their text, is a task of growing importance. One of the most urgent problems of authorship profiling (AP) is selecting text parameters which may correlate to an author’s personality. Most researchers’ selection of these is not underpinned by any theory. This article proposes an approach to AP which applies neuroscience data. The aim of the study is to assess the probability of self-destructive behaviour of an individual via formal parameters of their texts. Here we have used the “Personality Corpus”, which consists of Russian-language texts. A set of correlations between scores on the Freiburg Personality Inventory scales that are known to be indicative of self-destructive behaviour (“Spontaneous Aggressiveness”, “Depressiveness”, “Emotional Lability”, and “Composedness”) and text variables (average sentence length, lexical diversity etc.) has been calculated. Further, a mathematical model which predicts the probability of self-destructive behaviour has been obtained

    An automatic part-of-speech tagger for Middle Low German

    Get PDF
    Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them

    Synthetically generated text for supervised text analysis

    Full text link
    Supervised text models are a valuable tool for political scientists but present several obstacles to their use, including the expense of hand-labeling documents, the difficulty of retrieving rare relevant documents for annotation, and copyright and privacy concerns involved in sharing annotated documents. This article proposes a partial solution to these three issues, in the form of controlled generation of synthetic text with large language models. I provide a conceptual overview of text generation, guidance on when researchers should prefer different techniques for generating synthetic text, a discussion of ethics, and a simple technique for improving the quality of synthetic text. I demonstrate the usefulness of synthetic text with three applications: generating synthetic tweets describing the fighting in Ukraine, synthetic news articles describing specified political events for training an event detection system, and a multilingual corpus of populist manifesto statements for training a sentence-level populism classifier

    Performing ethos and digital behavior : the genealogy of WEeP

    Get PDF
    Through the genealogical analysis of the theatrical performance ‘The Werther’s Effect e-Project (WEeP), to be premiered in the beginning of 2016, we will discuss new possibilities of creation made viable by digital media. WEeP puts together a group of international performers, living in different cities during the creation process, to discuss suicide and copycat phenomena influenced by online events. Having as first inspiration the classic “The Sorrows of the Young Werther” by Goethe (1774), we want to approach the mysterious ways a (digital) event or a product influences on suicide in contemporary times, leading to, sometimes, the extreme cases of copycat suicide chains, phenomenon known as the “Werther’s Effect”. By tackling this theme, we intend to raise questions about contemporary affects and ways of dealing with the ephemeral aspects of the digital life, tracing the propagation of information and the influences in our analogical and online behavior. The conditions set by the international composition of the group led us to begin to experiment with rehearsals online and/or loaded on digital spaces. Skype, think tanks, multimodal platforms for notation, mobile apps
 These experiments raised many questions regarding creation and the craft of the performer. What is this new ethos of the performer, driven by different creative and corporeal models, rooted in a digital environment? What is presence in this context? What are the new improvisational tools and how to actualize analogical ones? What matters to be recorded/archived/posted? The great deal of information we have access to brought us to question our selective skills. So far, we are dealing with a great amount of new dance models within this context, but there is very little regarding the specificities of theatre. Therefore, this paper intends to bring some directions to theatrical/ performative creative processes, within WEeP’s nomadic and digital creation, by tracing new paradigms of the performer’s ethos

    Clinical Natural Language Processing in languages other than English: opportunities and challenges

    Get PDF
    Background: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. Main Body We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. Conclusion: We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Multiword expressions at length and in depth

    Get PDF
    The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

    On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism

    Full text link
    Barrón Cedeño, LA. (2012). On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism [Tesis doctoral no publicada]. Universitat PolitÚcnica de ValÚncia. https://doi.org/10.4995/Thesis/10251/16012Palanci

    The Genitive Ratio and its Applications

    Get PDF
    The genitive ratio (GR) is a novel method of classifying nouns as animate, concrete or abstract. English has two genitive (possessive) constructions: possessive-s (the boy's head) and possessive-of (the head of the boy). There is compelling evidence that preference for possessive-s is strongly influenced by the possessor's animacy. A corpus analysis that counts each genitive construction in three conditions (definite, indefinite and no article) confirms that occurrences of possessive-s decline as the animacy hierarchy progresses from animate through concrete to abstract. A computer program (Animyser) is developed to obtain results-counts from phrase-searches of Wikipedia that provide multiple genitive ratios for any target noun. Key ratios are identified and algorithms developed, with specific applications achieving classification accuracies of over 80%. The algorithms, based on logistic regression, produce a score of relative animacy that can be applied to individual nouns or to texts. The genitive ratio is a tool with potential applications in any research domain where the relative animacy of language might be significant. Three such applications exemplify that. Combining GR analysis with other factors might enhance established co-reference (anaphora) resolution algorithms. In sentences formed from pairings of animate with concrete or abstract nouns, the animate noun is usually salient, more likely to be the grammatical subject or thematic agent, and to co-refer with a succeeding pronoun or noun-phrase. Two experiments, online sentence production and corpus-based, demonstrate that the GR algorithm reliably predicts the salient noun. Replication of the online experiment in Italian suggests that the GR might be applied to other languages by using English as a 'bridge'. In a mental health context, studies have indicated that Alzheimer's patients' language becomes progressively more concrete; depressed patients' language more abstract. Analysis of sample texts suggests that the GR might monitor the prognosis of both illnesses, facilitating timely clinical interventions
    • 

    corecore