22,070 research outputs found
Econometrics meets sentiment : an overview of methodology and applications
The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software
Social Bots: Human-Like by Means of Human Control?
Social bots are currently regarded an influential but also somewhat
mysterious factor in public discourse and opinion making. They are considered
to be capable of massively distributing propaganda in social and online media
and their application is even suspected to be partly responsible for recent
election results. Astonishingly, the term `Social Bot' is not well defined and
different scientific disciplines use divergent definitions. This work starts
with a balanced definition attempt, before providing an overview of how social
bots actually work (taking the example of Twitter) and what their current
technical limitations are. Despite recent research progress in Deep Learning
and Big Data, there are many activities bots cannot handle well. We then
discuss how bot capabilities can be extended and controlled by integrating
humans into the process and reason that this is currently the most promising
way to go in order to realize effective interactions with other humans.Comment: 36 pages, 13 figure
Identification of monolingual and code-switch information from English-Kannada code-switch data
Code-switching is a very common occurrence in social media communication, predominantly found in multilingual countries like India. Using more than one language in communication is known as code-switching or code-mixing. Some of the important applications of code-switch are machine translation (MT), shallow parsing, dialog systems, and semantic parsing. Identifying code-switch and monolingual information is useful for better communication in online networking websites. In this paper, we performed a character level n-gram approach to identify monolingual and code-switch information from English-Kannada social media data. We paralleled various machine learning techniques such as naïve Bayes (NB), support vector classifier (SVC), logistic regression (LR) and neural network (NN) on English-Kannada code-switch (EKCS) data. From the proposed approach, it is observed that the character level n-gram approach provides 1.8% to 4.1% of improvement in terms of Accuracy and 1.6% to 3.8% of improvement in F1-score. Also observed that SVC and NN techniques are outperformed in terms of accuracy (97.9%) and F1-score (98%) with character level n-gram
Effective Feature Representation for Clinical Text Concept Extraction
Crucial information about the practice of healthcare is recorded only in
free-form text, which creates an enormous opportunity for high-impact NLP.
However, annotated healthcare datasets tend to be small and expensive to
obtain, which raises the question of how to make maximally efficient uses of
the available data. To this end, we develop an LSTM-CRF model for combining
unsupervised word representations and hand-built feature representations
derived from publicly available healthcare ontologies. We show that this
combined model yields superior performance on five datasets of diverse kinds of
healthcare text (clinical, social, scientific, commercial). Each involves the
labeling of complex, multi-word spans that pick out different healthcare
concepts. We also introduce a new labeled dataset for identifying the treatment
relations between drugs and diseases
- …