Search CORE

6 research outputs found

Deep learning with knowledge graphs for fine-grained emotion classification in text

Author: Schoene Annika M.
Publication venue
Publication date: 23/04/2021
Field of study

This PhD thesis investigates two key challenges in the area of fine-grained emotion detection in textual data. More specifically, this work focuses on (i) the accurate classification of emotion in tweets and (ii) improving the learning of representations from knowledge graphs using graph convolutional neural networks.The first part of this work outlines the task of emotion keyword detection in tweets and introduces a new resource called the EEK dataset. Tweets have previously been categorised as short sequences or sentence-level sentiment analysis, and it could be argued that this should no longer be the case, especially since Twitter increased its allowed character limit. Recurrent Neural Networks have become a well-established method to classify tweets over recent years, but have struggled with accurately classifying longer sequences due to the vanishing and exploding gradient descent problem. A common technique to overcome this problem has been to prune tweets to a shorter sequence length. However, this also meant that often potentially important emotion carrying information, which is often found towards the end of a tweet, was lost (e.g., emojis and hashtags). As such, tweets mostly face also problems with classifying long sequences, similar to other natural language processing tasks. To overcome these challenges, a multi-scale hierarchical recurrent neural network is proposed and benchmarked against other existing methods. The proposed learning model outperforms existing methods on the same task by up to 10.52%. Another key component for the accurate classification of tweets has been the use of language models, where more recent techniques such as BERT and ELMO have achieved great success in a range of different tasks. However, in Sentiment Analysis, a key challenge has always been to use language models that do not only take advantage of the context a word is used in but also the sentiment it carries. Therefore the second part of this work looks at improving representation learning for emotion classification by introducing both linguistic and emotion knowledge to language models. A new linguistically inspired knowledge graph called RELATE is introduced. Then a new language model is trained on a Graph Convolutional Neural Network and compared against several other existing language models, where it is found that the proposed embedding representations achieve competitive results to other LMs, whilst requiring less pre-training time and data. Finally, it is investigated how the proposed methods can be applied to document-level classification tasks. More specifically, this work focuses on the accurate classification of suicide notes and analyses whether sentiment and linguistic features are important for accurate classification

Repository@Hull - Worktribe

Hierarchical Multiscale Recurrent Neural Networks for Detecting Suicide Notes

Author: De Mel Geeth
Dethlefs Nina
Schoene Annika M
Turner Alexander P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/02/2021
Field of study

Recent statistics in suicide prevention show that people are increasingly posting their last words online and with the unprecedented availability of textual data from social media platforms researchers have the opportunity to analyse such data. Furthermore, psychological studies have shown that our state of mind can manifest itself in the linguistic features we use to communicate. In this paper, we investigate whether it is possible to automatically identify suicide notes from other types of social media blogs in two document-level classification tasks. The first task aims to identify suicide notes from depressed and blog posts in a balanced dataset, whilst the second experiment looks at how well suicide notes can be classified when there is a vast amount of neutral text data, which makes the task more applicable to real-world scenarios. Furthermore we perform a linguistic analysis using LIWC (Linguistic Inquiry and Word Count). We present a learning model for modelling long sequences in two experiment series. We achieve an f1-score of 88.26% over the baselines of 0.60 in experiment 1 and 96.1% over the baseline in experiment 2. Finally, we show through visualisations which features the learning model identifies, these include emotions such as love and personal pronouns

Repository@Hull - Worktribe

NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding.

Author: Alachram Halima
Ambite José Luis
Ananiadou Sophia
Beißbarth Tim
Chambers Brendan
Christopoulou Fenia
Evans James A
Galstyan Aram
Gao Xin
Garg Sahil
Hermjakob Ulf
Khomtchouk Bohdan B
King Ross
Li Maolin
Li Yu
Marcu Daniel
Matthew Joel
Pan Weidi
Rzhetsky Andrey
Schoene Annika M
Sheng Emily
Soldatova Larisa
Stevens Robert
Wang Kanix
Wingender Edgar
Publication venue: NPJ Syst Biol Appl
Publication date: 01/01/2021
Field of study

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus

Goldsmiths Research Online

Directory of Open Access Journals

Chalmers Research

Apollo (Cambridge)

Recommended from our members

NERO: A biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding

Author: Alachram Halima
Ambite José Luis
Ananiadou Sophia
Beißbarth Tim
Chambers Brendan
Christopoulou Fenia
Evans James A.
Galstyan Aram
Gao Xin
Garg Sahil
Hermjakob Ulf
Khomtchouk Bohdan B.
King Ross
Li Maolin
Li Yu
Marcu Daniel
Matthew Joel
Pan Weidi
Rzhetsky Andrey
Schoene Annika M.
Sheng Emily
Soldatova Larisa
Stevens Robert
Wang Kanix
Wingender Edgar
Publication venue
Publication date: 24/08/2023
Field of study

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades, the most dramatic advances in MR have followed in the wake of critical corpus development. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus

Knowledge UChicago