61 research outputs found
Semi-supervised Text Regression with Conditional Generative Adversarial Networks
Enormous online textual information provides intriguing opportunities for
understandings of social and economic semantics. In this paper, we propose a
novel text regression model based on a conditional generative adversarial
network (GAN), with an attempt to associate textual data and social outcomes in
a semi-supervised manner. Besides promising potential of predicting
capabilities, our superiorities are twofold: (i) the model works with
unbalanced datasets of limited labelled data, which align with real-world
scenarios; and (ii) predictions are obtained by an end-to-end framework,
without explicitly selecting high-level representations. Finally we point out
related datasets for experiments and future research directions
YouTube AV 50K: An Annotated Corpus for Comments in Autonomous Vehicles
With one billion monthly viewers, and millions of users discussing and
sharing opinions, comments below YouTube videos are rich sources of data for
opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset,
a freely-available collections of more than 50,000 YouTube comments and
metadata below autonomous vehicle (AV)-related videos. We describe its creation
process, its content and data format, and discuss its possible usages.
Especially, we do a case study of the first self-driving car fatality to
evaluate the dataset, and show how we can use this dataset to better understand
public attitudes toward self-driving cars and public reactions to the accident.
Future developments of the dataset are also discussed.Comment: in Proceedings of the Thirteenth International Joint Symposium on
Artificial Intelligence and Natural Language Processing (iSAI-NLP 2018
Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models
Volatility prediction--an essential concept in financial markets--has
recently been addressed using sentiment analysis methods. We investigate the
sentiment of annual disclosures of companies in stock markets to forecast
volatility. We specifically explore the use of recent Information Retrieval
(IR) term weighting models that are effectively extended by related terms using
word embeddings. In parallel to textual information, factual market data have
been widely used as the mainstream approach to forecast market risk. We
therefore study different fusion methods to combine text and market data
resources. Our word embedding-based approach significantly outperforms
state-of-the-art methods. In addition, we investigate the characteristics of
the reports of the companies in different financial sectors
- …