210 research outputs found
DDRel: A New Dataset for Interpersonal Relation Classification in Dyadic Dialogues
Interpersonal language style shifting in dialogues is an interesting and
almost instinctive ability of human. Understanding interpersonal relationship
from language content is also a crucial step toward further understanding
dialogues. Previous work mainly focuses on relation extraction between named
entities in texts. In this paper, we propose the task of relation
classification of interlocutors based on their dialogues. We crawled movie
scripts from IMSDb, and annotated the relation labels for each session
according to 13 pre-defined relationships. The annotated dataset DDRel consists
of 6300 dyadic dialogue sessions between 694 pair of speakers with 53,126
utterances in total. We also construct session-level and pair-level relation
classification tasks with widely-accepted baselines. The experimental results
show that this task is challenging for existing models and the dataset will be
useful for future research.Comment: This paper has been accepted by AAAI202
Sentiment Analysis of Text Guided by Semantics and Structure
As moods and opinions play a pivotal role in various business and economic processes, keeping track of one's stakeholders' sentiment can be of crucial importance to decision makers. Today's abundance of user-generated content allows for the automated monitoring of the opinions of many stakeholders, like consumers. One challenge for such automated sentiment analysis systems is to identify whether pieces of natural language text are positive or negative. Typical methods of identifying this polarity involve low-level linguistic analysis. Existing systems predominantly use morphological, lexical, and syntactic cues for polarity, like a text's words, their parts-of-speech, and negation or amplification of the conveyed sentiment. This dissertation argues that the polarity of text can be analysed more accurately when additionally accounting for semantics and structure. Polarity classification performance can benefit from exploiting the interactions that emoticons have on a semantic level with words – emoticons can express, stress, or disambiguate sentiment. Furthermore, semantic relations between and within languages can help identify meaningful cues for sentiment in multi-lingual polarity classification. An even better understanding of a text's conveyed sentiment can be obtained by guiding automated sentiment analysis by the rhetorical structure of the text, or at least of its most sentiment-carrying segments. Thus, the sentiment in, e.g., conclusions can be treated differently from the sentiment in background information. The findings of this dissertation suggest that the polarity of natural language text should not be determined solely based on what is said. Instead, one should account for how this message is conveyed as well
On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the
specific sentiment polarities toward certain aspects of products or services
behind the social media texts or reviews, which has been a fundamental
application to the real-world society. Since the early 2010s, ABSA has achieved
extraordinarily high accuracy with various deep neural models. However,
existing ABSA models with strong in-house performances may fail to generalize
to some challenging cases where the contexts are variable, i.e., low robustness
to real-world environments. In this study, we propose to enhance the ABSA
robustness by systematically rethinking the bottlenecks from all possible
angles, including model, data, and training. First, we strengthen the current
best-robust syntax-aware models by further incorporating the rich external
syntactic dependencies and the labels with aspect simultaneously with a
universal-syntax graph convolutional network. In the corpus perspective, we
propose to automatically induce high-quality synthetic training data with
various types, allowing models to learn sufficient inductive bias for better
robustness. Last, we based on the rich pseudo data perform adversarial training
to enhance the resistance to the context perturbation and meanwhile employ
contrastive learning to reinforce the representations of instances with
contrastive sentiments. Extensive robustness evaluations are conducted. The
results demonstrate that our enhanced syntax-aware model achieves better
robustness performances than all the state-of-the-art baselines. By
additionally incorporating our synthetic corpus, the robust testing results are
pushed with around 10% accuracy, which are then further improved by installing
the advanced training strategies. In-depth analyses are presented for revealing
the factors influencing the ABSA robustness.Comment: Accepted in ACM Transactions on Information System
Automated Classification of Argument Stance in Student Essays: A Linguistically Motivated Approach with an Application for Supporting Argument Summarization
This study describes a set of document- and sentence-level classification models designed to automate the task of determining the argument stance (for or against) of a student argumentative essay and the task of identifying any arguments in the essay that provide reasons in support of that stance. A suggested application utilizing these models is presented which involves the automated extraction of a single-sentence summary of an argumentative essay. This summary sentence indicates the overall argument stance of the essay from which the sentence was extracted and provides a representative argument in support of that stance.
A novel set of document-level stance classification features motivated by linguistic research involving stancetaking language is described. Several document-level classification models incorporating these features are trained and tested on a corpus of student essays annotated for stance. These models achieve accuracies significantly above those of two baseline models. High-accuracy features used by these models include a dependency subtree feature incorporating information about the targets of any stancetaking language in the essay text and a feature capturing the semantic relationship between the essay prompt text and stancetaking language in the essay text.
We also describe the construction of a corpus of essay sentences annotated for supporting argument stance. The resulting corpus is used to train and test two sentence-level classification models. The first model is designed to classify a given sentence as a supporting argument or as not a supporting argument, while the second model is designed to classify a supporting argument as holding a for or against stance. Features motivated by influential linguistic analyses of the lexical, discourse, and rhetorical features of supporting arguments are used to build these two models, both of which achieve accuracies above their respective baseline models.
An application illustrating an interesting use-case for the models presented in this dissertation is described. This application incorporates all three classification models to extract a single sentence summarizing both the overall stance of a given text along with a convincing reason in support of that stance
Implicit emotion detection in text
In text, emotion can be expressed explicitly, using emotion-bearing words (e.g. happy, guilty) or implicitly without emotion-bearing words. Existing approaches focus on the detection of explicitly expressed emotion in text. However, there are various ways to express and convey emotions without the use of these emotion-bearing words. For example, given two sentences: “The outcome of my exam makes me happy” and “I passed my exam”, both sentences express happiness, with the first expressing it explicitly and the other implying it. In this thesis, we investigate implicit emotion detection in text. We propose a rule-based approach for implicit emotion detection, which can be used without labeled corpora for training. Our results show that our approach outperforms the lexicon matching method consistently and gives competitive performance in comparison to supervised classifiers. Given that emotions such as guilt and admiration which often require the identification of blameworthiness and praiseworthiness, we also propose an approach for the detection of blame and praise in text, using an adapted psychology model, Path model to blame. Lack of benchmarking dataset led us to construct a corpus containing comments of individuals’ emotional experiences annotated as blame, praise or others. Since implicit emotion detection might be useful for conflict-of-interest (CoI) detection in Wikipedia articles, we built a CoI corpus and explored various features including linguistic and stylometric, presentation, bias and emotion features. Our results show that emotion features are important when using Nave Bayes, but the best performance is obtained with SVM on linguistic and stylometric features only. Overall, we show that a rule-based approach can be used to detect implicit emotion in the absence of labelled data; it is feasible to adopt the psychology path model to blame for blame/praise detection from text, and implicit emotion detection is beneficial for CoI detection in Wikipedia articles
Sentiment Analysis of Text Guided by Semantics and Structure
As moods and opinions play a pivotal role in various business and economic processes, keeping track of one's stakeholders' sentiment can be of crucial importance to decision makers. Today's abundance of user-generated content allows for the automated monitoring of the opinions of many stakeholders, like consumers. One challenge for such automated sentiment analysis systems is to identify whether pieces of natural language text are positive or negative.
Typical methods of identifying this polarity involve low-level linguistic analysis. Existing systems predominantly use morphological, lexical, and syntactic cues for polarity, like a text's words, their parts-of-speech, and negation or amplification of the conveyed sentiment. This dissertation argues that the polarity of text can be analysed more accurately when additionally accounting for semantics and structure.
Polarity classification performance can benefit from exploiting the interactions that emoticons have on a semantic level with words – emoticons can express, stress, or disambiguate sentiment. Furthermore, semantic relations between and within languages can help identify meaningful cues for sentiment in multi-lingual polarity classification.
An even better understanding of a text's conveyed sentiment can be obtained by guiding automated sentiment analysis by the rhetorical structure of the text, or at least of its most sentiment-carrying segments. Thus, the sentiment in, e.g., conclusions can be treated differently from the sentiment in background information.
The findings of this dissertation suggest that the polarity of natural language text should not be determined solely based on what is said. Instead, one should account for how this message is conveyed as well
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
- …