31,667 research outputs found
Cognitive level classification on information communication technology skills for blog
Learners can study and update their knowledge continually due to the rapid growth of online content. The Medium blog is a well-known open platform that encourages authors who want to share their experiences to publish content on various topics in multiple languages. Meanwhile, readers can query interesting content by searching for a related topic. However, finding suitable content is still challenging for learners, especially information communication technology (ICT) content in Thai, and needs to be classified into beginner, intermediate, and advanced cognitive levels. Moreover, ICT blog content is usually a mix of Thai language and technical terms in English. To overcome the challenge of content classification, a deep neural network (DNN) classification model was constructed to classify the ICT content from the Medium blog into three levels based on cognition. We examined and compared the classification results with strong baseline models, including logistic regression, multinomial naĆÆve bayes, support vector machine (SVM), and multilayer perceptron (MLP). The experimental results indicate that the proposed DNN model attained the highest accuracy (0.878), precision (0.882), recall (0.878), and F1-score (0.875).Ā
Topic-dependent sentiment analysis of financial blogs
While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches
Exploring the use of paragraph-level annotations for sentiment analysis of financial blogs
In this paper we describe our work in the area of topic-based sentiment analysis in the domain of financial blogs. We explore the use of paragraph-level and document-level annotations, examining how additional information from paragraph-level annotations can be used to increase the accuracy of document-level sentiment classification. We acknowledge the additional effort required to provide these paragraph-level annotations, and so we compare these findings against an automatic means of generating topic-specific sub-documents
Beyond Sentiment: The Manifold of Human Emotions
Sentiment analysis predicts the presence of positive or negative emotions in
a text document. In this paper we consider higher dimensional extensions of the
sentiment concept, which represent a richer set of human emotions. Our approach
goes beyond previous work in that our model contains a continuous manifold
rather than a finite set of human emotions. We investigate the resulting model,
compare it to psychological observations, and explore its predictive
capabilities. Besides obtaining significant improvements over a baseline
without manifold, we are also able to visualize different notions of positive
sentiment in different domains.Comment: 15 pages, 7 figure
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data
In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the modelās ability to generalise to other data genres
- ā¦