1,200 research outputs found
Statistical Inferences for Polarity Identification in Natural Language
Information forms the basis for all human behavior, including the ubiquitous
decision-making that people constantly perform in their every day lives. It is
thus the mission of researchers to understand how humans process information to
reach decisions. In order to facilitate this task, this work proposes a novel
method of studying the reception of granular expressions in natural language.
The approach utilizes LASSO regularization as a statistical tool to extract
decisive words from textual content and draw statistical inferences based on
the correspondence between the occurrences of words and an exogenous response
variable. Accordingly, the method immediately suggests significant implications
for social sciences and Information Systems research: everyone can now identify
text segments and word choices that are statistically relevant to authors or
readers and, based on this knowledge, test hypotheses from behavioral research.
We demonstrate the contribution of our method by examining how authors
communicate subjective information through narrative materials. This allows us
to answer the question of which words to choose when communicating negative
information. On the other hand, we show that investors trade not only upon
facts in financial disclosures but are distracted by filler words and
non-informative language. Practitioners - for example those in the fields of
investor communications or marketing - can exploit our insights to enhance
their writings based on the true perception of word choice
Thumbs up? Sentiment Classification using Machine Learning Techniques
We consider the problem of classifying documents not by topic, but by overall
sentiment, e.g., determining whether a review is positive or negative. Using
movie reviews as data, we find that standard machine learning techniques
definitively outperform human-produced baselines. However, the three machine
learning methods we employed (Naive Bayes, maximum entropy classification, and
support vector machines) do not perform as well on sentiment classification as
on traditional topic-based categorization. We conclude by examining factors
that make the sentiment classification problem more challenging.Comment: To appear in EMNLP-200
Combining Thesaurus Knowledge and Probabilistic Topic Models
In this paper we present the approach of introducing thesaurus knowledge into
probabilistic topic models. The main idea of the approach is based on the
assumption that the frequencies of semantically related words and phrases,
which are met in the same texts, should be enhanced: this action leads to their
larger contribution into topics found in these texts. We have conducted
experiments with several thesauri and found that for improving topic models, it
is useful to utilize domain-specific knowledge. If a general thesaurus, such as
WordNet, is used, the thesaurus-based improvement of topic models can be
achieved with excluding hyponymy relations in combined topic models.Comment: Accepted to AIST-2017 conference (http://aistconf.ru/). The final
publication will be available at link.springer.co
Textual Information and IPO Underpricing: A Machine Learning Approach
This study examines the predictive power of textual information from S-1 filings in explaining IPO underpricing. Our empirical approach differs from previous research, as we utilize several machine learning algorithms to predict whether an IPO will be underpriced, or not. We analyze a large sample of 2,481 U.S. IPOs from 1997 to 2016, and we find that textual information can effectively complement traditional financial variables in terms of prediction accuracy. In fact, models that use both textual data and financial variables as inputs have superior performance compared to models using a single type of input. We attribute our findings to the fact that textual information can reduce the ex-ante valuation uncertainty of IPO firms, thus leading to more accurate estimates
A linguistically-driven methodology for detecting impending and unfolding emergencies from social media messages
Natural disasters have demonstrated the crucial role of social media before, during and after emergencies
(Haddow & Haddow 2013). Within our EU project Sland \ub4 ail, we aim to ethically improve \ub4
the use of social media in enhancing the response of disaster-related agen-cies. To this end, we
have collected corpora of social and formal media to study newsroom communication of emergency
management organisations in English and Italian. Currently, emergency management agencies
in English-speaking countries use social media in different measure and different degrees,
whereas Italian National Protezione Civile only uses Twitter at the moment. Our method is developed
with a view to identifying communicative strategies and detecting sentiment in order to
distinguish warnings from actual disasters and major from minor disasters. Our linguistic analysis
uses humans to classify alert/warning messages or emer-gency response and mitigation ones based
on the terminology used and the sentiment expressed. Results of linguistic analysis are then used
to train an application by tagging messages and detecting disaster- and/or emergency-related terminology
and emotive language to simulate human rating and forward information to an emergency
management system
Firm‐level climate change exposure
We develop a method that identifies the attention paid by earnings call participants to firms' climate change exposures. The method adapts a machine learning keyword discovery algorithm and captures exposures related to opportunity, physical, and regulatory shocks associated with climate change. The measures are available for more than 10,000 firms from 34 countries between 2002 and 2020. We show that the measures are useful in predicting important real outcomes related to the net-zero transition, in particular, job creation in disruptive green technologies and green patenting, and that they contain information that is priced in options and equity markets
- …