7,559 research outputs found

    Profiling a set of personality traits of text author: what our words reveal about us

    Get PDF
    Authorship profiling, i.e. revealing information about an unknown author by analyzing their text, is a task of growing importance. One of the most urgent problems of authorship profiling (AP) is selecting text parameters which may correlate to an author’s personality. Most researchers’ selection of these is not underpinned by any theory. This article proposes an approach to AP which applies neuroscience data. The aim of the study is to assess the probability of self-destructive behaviour of an individual via formal parameters of their texts. Here we have used the “Personality Corpus”, which consists of Russian-language texts. A set of correlations between scores on the Freiburg Personality Inventory scales that are known to be indicative of self-destructive behaviour (“Spontaneous Aggressiveness”, “Depressiveness”, “Emotional Lability”, and “Composedness”) and text variables (average sentence length, lexical diversity etc.) has been calculated. Further, a mathematical model which predicts the probability of self-destructive behaviour has been obtained

    Hierarchical Multiscale Recurrent Neural Networks for Detecting Suicide Notes

    Get PDF
    Recent statistics in suicide prevention show that people are increasingly posting their last words online and with the unprecedented availability of textual data from social media platforms researchers have the opportunity to analyse such data. Furthermore, psychological studies have shown that our state of mind can manifest itself in the linguistic features we use to communicate. In this paper, we investigate whether it is possible to automatically identify suicide notes from other types of social media blogs in two document-level classification tasks. The first task aims to identify suicide notes from depressed and blog posts in a balanced dataset, whilst the second experiment looks at how well suicide notes can be classified when there is a vast amount of neutral text data, which makes the task more applicable to real-world scenarios. Furthermore we perform a linguistic analysis using LIWC (Linguistic Inquiry and Word Count). We present a learning model for modelling long sequences in two experiment series. We achieve an f1-score of 88.26% over the baselines of 0.60 in experiment 1 and 96.1% over the baseline in experiment 2. Finally, we show through visualisations which features the learning model identifies, these include emotions such as love and personal pronouns

    Deep learning with knowledge graphs for fine-grained emotion classification in text

    Get PDF
    This PhD thesis investigates two key challenges in the area of fine-grained emotion detection in textual data. More specifically, this work focuses on (i) the accurate classification of emotion in tweets and (ii) improving the learning of representations from knowledge graphs using graph convolutional neural networks.The first part of this work outlines the task of emotion keyword detection in tweets and introduces a new resource called the EEK dataset. Tweets have previously been categorised as short sequences or sentence-level sentiment analysis, and it could be argued that this should no longer be the case, especially since Twitter increased its allowed character limit. Recurrent Neural Networks have become a well-established method to classify tweets over recent years, but have struggled with accurately classifying longer sequences due to the vanishing and exploding gradient descent problem. A common technique to overcome this problem has been to prune tweets to a shorter sequence length. However, this also meant that often potentially important emotion carrying information, which is often found towards the end of a tweet, was lost (e.g., emojis and hashtags). As such, tweets mostly face also problems with classifying long sequences, similar to other natural language processing tasks. To overcome these challenges, a multi-scale hierarchical recurrent neural network is proposed and benchmarked against other existing methods. The proposed learning model outperforms existing methods on the same task by up to 10.52%. Another key component for the accurate classification of tweets has been the use of language models, where more recent techniques such as BERT and ELMO have achieved great success in a range of different tasks. However, in Sentiment Analysis, a key challenge has always been to use language models that do not only take advantage of the context a word is used in but also the sentiment it carries. Therefore the second part of this work looks at improving representation learning for emotion classification by introducing both linguistic and emotion knowledge to language models. A new linguistically inspired knowledge graph called RELATE is introduced. Then a new language model is trained on a Graph Convolutional Neural Network and compared against several other existing language models, where it is found that the proposed embedding representations achieve competitive results to other LMs, whilst requiring less pre-training time and data. Finally, it is investigated how the proposed methods can be applied to document-level classification tasks. More specifically, this work focuses on the accurate classification of suicide notes and analyses whether sentiment and linguistic features are important for accurate classification

    Toward Online Linguistic Surveillance of Threatening Messages

    Get PDF
    Threats are communicative acts, but it is not always obvious what they communicate or when they communicate imminent credible and serious risk. This paper proposes a research- and theory-based set of over 20 potential linguistic risk indicators that may discriminate credible from non-credible threats within online threat message corpora. Two prongs are proposed: (1) Using expert and layperson ratings to validate subjective scales in relation to annotated known risk messages, and (2) Using the resulting annotated corpora for automated machine learning with computational linguistic analyses to classify non-threats, false threats, and credible threats. Rating scales are proposed, existing threat corpora are identified, and some prospective computational linguistic procedures are identified. Implications for ongoing threat surveillance and its applications are explored

    Suicidal Ideation and Mental Disorder Detection with Attentive Relation Networks

    Full text link
    Mental health is a critical issue in modern society, and mental disorders could sometimes turn to suicidal ideation without effective treatment. Early detection of mental disorders and suicidal ideation from social content provides a potential way for effective social intervention. However, classifying suicidal ideation and other mental disorders is challenging as they share similar patterns in language usage and sentimental polarity. This paper enhances text representation with lexicon-based sentiment scores and latent topics and proposes using relation networks to detect suicidal ideation and mental disorders with related risk indicators. The relation module is further equipped with the attention mechanism to prioritize more critical relational features. Through experiments on three real-world datasets, our model outperforms most of its counterparts
    corecore