16 research outputs found
The good, the bad and the implicit: a comprehensive approach to annotating explicit and implicit sentiment
We present a fine-grained scheme for the annotation of polar sentiment in text, that accounts for explicit sentiment (so-called private states), as well as implicit expressions of sentiment (polar facts). Polar expressions are annotated below sentence level and classified according to their subjectivity status. Additionally, they are linked to one or more targets with a specific polar orientation and intensity. Other components of the annotation scheme include source attribution and the identification and classification of expressions that modify polarity. In previous research, little attention has been given to implicit sentiment, which represents a substantial amount of the polar expressions encountered in our data. An English and Dutch corpus of financial newswire, consisting of over 45,000 words each, was annotated using our scheme. A subset of this corpus was used to conduct an inter-annotator agreement study, which demonstrated that the proposed scheme can be used to reliably annotate explicit and implicit sentiment in real-world textual data, making the created corpora a useful resource for sentiment analysis
Automatic Detection of Cyberbullying in Social Media Text
While social media offer great communication opportunities, they also
increase the vulnerability of young people to threatening situations online.
Recent studies report that cyberbullying constitutes a growing problem among
youngsters. Successful prevention depends on the adequate detection of
potentially harmful messages and the information overload on the Web requires
intelligent systems to identify potential risks automatically. The focus of
this paper is on automatic cyberbullying detection in social media text by
modelling posts written by bullies, victims, and bystanders of online bullying.
We describe the collection and fine-grained annotation of a training corpus for
English and Dutch and perform a series of binary classification experiments to
determine the feasibility of automatic cyberbullying detection. We make use of
linear support vector machines exploiting a rich feature set and investigate
which information sources contribute the most for this particular task.
Experiments on a holdout test set reveal promising results for the detection of
cyberbullying-related posts. After optimisation of the hyperparameters, the
classifier yields an F1-score of 64% and 61% for English and Dutch
respectively, and considerably outperforms baseline systems based on keywords
and word unigrams.Comment: 21 pages, 9 tables, under revie
Exploring the fine-grained analysis and automatic detection of irony on Twitter
To push the state of the art in text mining applications, research in natural language
processing has increasingly been investigating automatic irony detection, but manually
annotated irony corpora are scarce. We present the construction of a manually
annotated irony corpus based on a fine-grained annotation scheme for irony that
allows to identify different irony types. We conduct a series of binary classification
experiments for automatic irony recognition using a support vector machine exploiting
a varied feature set and a deep learning approach making use of an LSTM network
and (pre-trained) word embeddings. Evaluation on a held-out corpus shows that the
SVM model outperforms the neural network approach and benefits from combining
lexical, semantic and syntactic information sources. A qualitative analysis of the
classification output reveals that the classifier performance may be further enhanced
by integrating implicit sentiment information and context- and user-based features
Using frame-based resources for sentiment analysis within the financial domain
User-generated data in blogs and social networks have recently become a valuable resource for sentiment analysis in the financial domain, since they have been shown to be extremely significant to marketing research companies and public opinion organizations. In order to identify bullish and bearish sentiments associated with companies and stocks, we propose a fine-grained approach that returns a continuous score in the [-1,+1] range. Our supervised approach leverages a frame-based ontological resource which produces feature sets such as lexical features, semantic features and their combination. One of the outcome of our analysis suggests that the frame-based ontological resource we have used might be successfully applied for sentiment analysis within the financial domain achieving better results than traditional sentiment analysis methods that do not embody semantics. We also show the higher performance of a fine-grained approach based solely on the evaluation of specific substrings of the message, rather than on features extracted from the whole text of a financial microblog message through the frame-based ontological resource. We have also compared our system with semi-supervised and unsupervised approaches and results indicate that our approach outperforms the others. Last but not the least, our approach is general and can be applied on top of any existing supervised method of polarity detection