16 research outputs found

    The good, the bad and the implicit: a comprehensive approach to annotating explicit and implicit sentiment

    Get PDF
    We present a fine-grained scheme for the annotation of polar sentiment in text, that accounts for explicit sentiment (so-called private states), as well as implicit expressions of sentiment (polar facts). Polar expressions are annotated below sentence level and classified according to their subjectivity status. Additionally, they are linked to one or more targets with a specific polar orientation and intensity. Other components of the annotation scheme include source attribution and the identification and classification of expressions that modify polarity. In previous research, little attention has been given to implicit sentiment, which represents a substantial amount of the polar expressions encountered in our data. An English and Dutch corpus of financial newswire, consisting of over 45,000 words each, was annotated using our scheme. A subset of this corpus was used to conduct an inter-annotator agreement study, which demonstrated that the proposed scheme can be used to reliably annotate explicit and implicit sentiment in real-world textual data, making the created corpora a useful resource for sentiment analysis

    Automatic Detection of Cyberbullying in Social Media Text

    Get PDF
    While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a training corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for this particular task. Experiments on a holdout test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1-score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems based on keywords and word unigrams.Comment: 21 pages, 9 tables, under revie

    Exploring the fine-grained analysis and automatic detection of irony on Twitter

    No full text
    To push the state of the art in text mining applications, research in natural language processing has increasingly been investigating automatic irony detection, but manually annotated irony corpora are scarce. We present the construction of a manually annotated irony corpus based on a fine-grained annotation scheme for irony that allows to identify different irony types. We conduct a series of binary classification experiments for automatic irony recognition using a support vector machine exploiting a varied feature set and a deep learning approach making use of an LSTM network and (pre-trained) word embeddings. Evaluation on a held-out corpus shows that the SVM model outperforms the neural network approach and benefits from combining lexical, semantic and syntactic information sources. A qualitative analysis of the classification output reveals that the classifier performance may be further enhanced by integrating implicit sentiment information and context- and user-based features

    Using frame-based resources for sentiment analysis within the financial domain

    No full text
    User-generated data in blogs and social networks have recently become a valuable resource for sentiment analysis in the financial domain, since they have been shown to be extremely significant to marketing research companies and public opinion organizations. In order to identify bullish and bearish sentiments associated with companies and stocks, we propose a fine-grained approach that returns a continuous score in the [-1,+1] range. Our supervised approach leverages a frame-based ontological resource which produces feature sets such as lexical features, semantic features and their combination. One of the outcome of our analysis suggests that the frame-based ontological resource we have used might be successfully applied for sentiment analysis within the financial domain achieving better results than traditional sentiment analysis methods that do not embody semantics. We also show the higher performance of a fine-grained approach based solely on the evaluation of specific substrings of the message, rather than on features extracted from the whole text of a financial microblog message through the frame-based ontological resource. We have also compared our system with semi-supervised and unsupervised approaches and results indicate that our approach outperforms the others. Last but not the least, our approach is general and can be applied on top of any existing supervised method of polarity detection
    corecore