6,587 research outputs found
A Hybrid Approach to Sentiment Analysis with Benchmarking Results
The objective of this article is two-fold. Firstly, a hybrid approach to Sentiment Analysis encompassing the use of Semantic Rules, Fuzzy Sets and an enriched Sentiment Lexicon, improved with the support of SentiWordNet is described. Secondly, the proposed hybrid method is compared against two well established Supervised Learning techniques,
Naïve Bayes and Maximum Entropy. Using the well known and publicly available Movie Review Dataset, the proposed hybrid system achieved higher accuracy and precision than Naïve Bayes (NB) and Maximum Entropy (ME)
A Hybrid Approach to Domain-Specific Entity Linking
The current state-of-the-art Entity Linking (EL) systems are geared towards
corpora that are as heterogeneous as the Web, and therefore perform
sub-optimally on domain-specific corpora. A key open problem is how to
construct effective EL systems for specific domains, as knowledge of the local
context should in principle increase, rather than decrease, effectiveness. In
this paper we propose the hybrid use of simple specialist linkers in
combination with an existing generalist system to address this problem. Our
main findings are the following. First, we construct a new reusable benchmark
for EL on a corpus of domain-specific conversations. Second, we test the
performance of a range of approaches under the same conditions, and show that
specialist linkers obtain high precision in isolation, and high recall when
combined with generalist linkers. Hence, we can effectively exploit local
context and get the best of both worlds.Comment: SEM'1
Sentiment Analysis for Words and Fiction Characters From The Perspective of Computational (Neuro-)Poetics
Two computational studies provide different sentiment analyses for text segments (e.g., ‘fearful’ passages) and figures (e.g., ‘Voldemort’) from the Harry Potter books (Rowling, 1997 - 2007) based on a novel simple tool called SentiArt. The tool uses vector space models together with theory-guided, empirically validated label lists to compute the valence of each word in a text by locating its position in a 2d emotion potential space spanned by the > 2 million words of the vector space model. After testing the tool’s accuracy with empirical data from a neurocognitive study, it was applied to compute emotional figure profiles and personality figure profiles (inspired by the so-called ‚big five’ personality theory) for main characters from the book series. The results of comparative analyses using different machine-learning classifiers (e.g., AdaBoost, Neural Net) show that SentiArt performs very well in predicting the emotion potential of text passages. It also produces plausible predictions regarding the emotional and personality profile of fiction characters which are correctly identified on the basis of eight character features, and it achieves a good cross-validation accuracy in classifying 100 figures into ‘good’ vs. ‘bad’ ones. The results are discussed with regard to potential applications of SentiArt in digital literary, applied reading and neurocognitive poetics studies such as the quantification of the hybrid hero potential of figures
Crowdbreaks: Tracking Health Trends using Public Social Media Data and Crowdsourcing
In the past decade, tracking health trends using social media data has shown
great promise, due to a powerful combination of massive adoption of social
media around the world, and increasingly potent hardware and software that
enables us to work with these new big data streams. At the same time, many
challenging problems have been identified. First, there is often a mismatch
between how rapidly online data can change, and how rapidly algorithms are
updated, which means that there is limited reusability for algorithms trained
on past data as their performance decreases over time. Second, much of the work
is focusing on specific issues during a specific past period in time, even
though public health institutions would need flexible tools to assess multiple
evolving situations in real time. Third, most tools providing such capabilities
are proprietary systems with little algorithmic or data transparency, and thus
little buy-in from the global public health and research community. Here, we
introduce Crowdbreaks, an open platform which allows tracking of health trends
by making use of continuous crowdsourced labelling of public social media
content. The system is built in a way which automatizes the typical workflow
from data collection, filtering, labelling and training of machine learning
classifiers and therefore can greatly accelerate the research process in the
public health domain. This work introduces the technical aspects of the
platform and explores its future use cases
The Development of a Temporal Information Dictionary for Social Media Analytics
Dictionaries have been used to analyze text even before the emergence of social media and the use of dictionaries for sentiment analysis there. While dictionaries have been used to understand the tonality of text, so far it has not been possible to automatically detect if the tonality refers to the present, past, or future. In this research, we develop a dictionary containing time-indicating words in a wordlist (T-wordlist). To test how the dictionary performs, we apply our T-wordlist on different disaster related social media datasets. Subsequently we will validate the wordlist and results by a manual content analysis. So far, in this research-in-progress, we were able to develop a first dictionary and will also provide some initial insight into the performance of our wordlist
- …