161,562 research outputs found
Harnessing Twitter for Automatic Sentiment Identification
Sentiment Analysis is a motivating space of research because of its applications in different fields. Gathering opinions of individuals about products, social and political events, and problems through the web is turning out to be progressively prevalent consistently. People’s opinions are beneficial for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, web blogs, micro-blogs, Twitter and social networks. User generated content on Twitter gives an ample source to gathering individuals’ opinion. Due to the gigantic number of tweets as unstructured text, it is difficult to outline the information physically. Accordingly, proficient computational strategies are required for mining and condensing the tweets from corpuses which, requires knowledge of sentiment bearing words. Many computational methods, models and algorithms are there for identifying sentiment from unstructured text. Most of them rely on machine-learning techniques, using Bag-of-Words (BoW) representation as their basis. In this study, we have used lexicon based approach for automatic identification of sentiment for tweets collected from twitter public domain. We have also applied three different machine learning algorithm (Naive Bayes (NB), Maximum Entropy (ME) and Support Vector Machines (SVM)) for sentiment identification of tweets, to examine the effectiveness of various feature combinations. Our experiments demonstrate that both NB with Laplace smoothing and SVM are effective in classifying the tweets. The feature used for NB are unigram and Part-of-Speech (POS), whereas unigram is used for SVM
Predicting the Effects of News Sentiments on the Stock Market
Stock market forecasting is very important in the planning of business
activities. Stock price prediction has attracted many researchers in multiple
disciplines including computer science, statistics, economics, finance, and
operations research. Recent studies have shown that the vast amount of online
information in the public domain such as Wikipedia usage pattern, news stories
from the mainstream media, and social media discussions can have an observable
effect on investors opinions towards financial markets. The reliability of the
computational models on stock market prediction is important as it is very
sensitive to the economy and can directly lead to financial loss. In this
paper, we retrieved, extracted, and analyzed the effects of news sentiments on
the stock market. Our main contributions include the development of a sentiment
analysis dictionary for the financial sector, the development of a
dictionary-based sentiment analysis model, and the evaluation of the model for
gauging the effects of news sentiments on stocks for the pharmaceutical market.
Using only news sentiments, we achieved a directional accuracy of 70.59% in
predicting the trends in short-term stock price movement.Comment: 4 page
Connecting the dots: information visualization and text analysis of the Searchlight Project newsletters
This report is the product of the Pardee Center’s work on the Searchlight:Visualization and Analysis of Trend Data project sponsored by the Rockefeller Foundation. Part of a larger effort to analyze and disseminate on-the-ground information about important societal trends as reported in a large number of regional newsletters developed in Asia, Africa and the Americas specifically for the Foundation, the Pardee Center developed sophisticated methods to systematically review, categorize, analyze, visualize, and draw conclusions from the information in the newsletters.The Rockefeller Foundatio
The Impact of Crowds on News Engagement: A Reddit Case Study
Today, users are reading the news through social platforms. These platforms
are built to facilitate crowd engagement, but not necessarily disseminate
useful news to inform the masses. Hence, the news that is highly engaged with
may not be the news that best informs. While predicting news popularity has
been well studied, it has not been studied in the context of crowd
manipulations. In this paper, we provide some preliminary results to a longer
term project on crowd and platform manipulations of news and news popularity.
In particular, we choose to study known features for predicting news popularity
and how those features may change on reddit.com, a social platform used
commonly for news aggregation. Along with this, we explore ways in which users
can alter the perception of news through changing the title of an article. We
find that news on reddit is predictable using previously studied sentiment and
content features and that posts with titles changed by reddit users tend to be
more popular than posts with the original article title.Comment: Published at The 2nd International Workshop on News and Public
Opinion at ICWSM 201
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Understanding predictions made by deep neural networks is notoriously
difficult, but also crucial to their dissemination. As all ML-based methods,
they are as good as their training data, and can also capture unwanted biases.
While there are tools that can help understand whether such biases exist, they
do not distinguish between correlation and causation, and might be ill-suited
for text-based models and for reasoning about high level language concepts. A
key problem of estimating the causal effect of a concept of interest on a given
model is that this estimation requires the generation of counterfactual
examples, which is challenging with existing generation technology. To bridge
that gap, we propose CausaLM, a framework for producing causal model
explanations using counterfactual language representation models. Our approach
is based on fine-tuning of deep contextualized embedding models with auxiliary
adversarial tasks derived from the causal graph of the problem. Concretely, we
show that by carefully choosing auxiliary adversarial pre-training tasks,
language representation models such as BERT can effectively learn a
counterfactual representation for a given concept of interest, and be used to
estimate its true causal effect on model performance. A byproduct of our method
is a language representation model that is unaffected by the tested concept,
which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at:
https://amirfeder.github.io/CausaLM/ Under review for the Computational
Linguistics journa
Automatically detecting open academic review praise and criticism
This is an accepted manuscript of an article published by Emerald in Online Information Review on 15 June 2020.
The accepted version of the publication may differ from the final published version, accessible at https://doi.org/10.1108/OIR-11-2019-0347.Purpose: Peer reviewer evaluations of academic papers are known to be variable in content and overall judgements but are important academic publishing safeguards. This article introduces a sentiment analysis program, PeerJudge, to detect praise and criticism in peer evaluations. It is designed to support editorial management decisions and reviewers in the scholarly publishing process and for grant funding decision workflows. The initial version of PeerJudge is tailored for reviews from F1000Research’s open peer review publishing platform.
Design/methodology/approach: PeerJudge uses a lexical sentiment analysis approach with a human-coded initial sentiment lexicon and machine learning adjustments and additions. It was built with an F1000Research development corpus and evaluated on a different F1000Research test corpus using reviewer ratings.
Findings: PeerJudge can predict F1000Research judgements from negative evaluations in reviewers’ comments more accurately than baseline approaches, although not from positive reviewer comments, which seem to be largely unrelated to reviewer decisions. Within the F1000Research mode of post-publication peer review, the absence of any detected negative comments is a reliable indicator that an article will be ‘approved’, but the presence of moderately negative comments could lead to either an approved or approved with reservations decision.
Originality/value: PeerJudge is the first transparent AI approach to peer review sentiment detection. It may be used to identify anomalous reviews with text potentially not matching judgements for individual checks or systematic bias assessments
The applications of social media in sports marketing
n the era of big data, sports consumer's activities in social media become valuable assets to sports marketers. In this paper, the authors review extant literature regarding how to effectively use social media to promote sports as well as how to effectively analyze social media data to support business decisions. Methods: The literature review method. Results: Our findings suggest that sports marketers can use social media to achieve the following goals, such as facilitating marketing communication campaigns, adding values to sports products and services, creating a two-way communication between sports brands and consumers, supporting sports sponsorship program, and forging brand communities. As to how to effectively analyze social media data to support business decisions, extent literature suggests that sports marketers to undertake traffic and engagement analysis on their social media sites as well as to conduct sentiment analysis to probe customer's opinions. These insights can support various aspects of business decisions, such as marketing communication management, consumer's voice probing, and sales predictions. Conclusion: Social media are ubiquitous in the sports marketing and consumption practices. In the era of big data, these "footprints" can now be effectively analyzed to generate insights to support business decisions. Recommendations to both the sports marketing practices and research are also addressed
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
- …