Search CORE

61 research outputs found

Predicting risk from financial reports with regression

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Semi-supervised Text Regression with Conditional Generative Adversarial Networks

Author: Li Tao
Liu Xudong
Su Shihan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/11/2018
Field of study

Enormous online textual information provides intriguing opportunities for understandings of social and economic semantics. In this paper, we propose a novel text regression model based on a conditional generative adversarial network (GAN), with an attempt to associate textual data and social outcomes in a semi-supervised manner. Besides promising potential of predicting capabilities, our superiorities are twofold: (i) the model works with unbalanced datasets of limited labelled data, which align with real-world scenarios; and (ii) predictions are obtained by an end-to-end framework, without explicitly selecting high-level representations. Finally we point out related datasets for experiments and future research directions

arXiv.org e-Print Archive

Crossref

Caltech Authors

YouTube AV 50K: An Annotated Corpus for Comments in Autonomous Vehicles

Author: Choi Minsoo
Fu Kaiming
Gong Siyuan
Li Tao
Lin Lei
Wang Jian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/10/2018
Field of study

With one billion monthly viewers, and millions of users discussing and sharing opinions, comments below YouTube videos are rich sources of data for opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset, a freely-available collections of more than 50,000 YouTube comments and metadata below autonomous vehicle (AV)-related videos. We describe its creation process, its content and data format, and discuss its possible usages. Especially, we do a case study of the first self-driving car fatality to evaluate the dataset, and show how we can use this dataset to better understand public attitudes toward self-driving cars and public reactions to the accident. Future developments of the dataset are also discussed.Comment: in Proceedings of the Thirteenth International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2018

arXiv.org e-Print Archive

Crossref

Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models

Author: Anderson Linda
Baklanov Artem
Duer Alexander
Hanbury Allan
Lupu Mihai
Rekabsaz Navid
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Volatility prediction--an essential concept in financial markets--has recently been addressed using sentiment analysis methods. We investigate the sentiment of annual disclosures of companies in stock markets to forecast volatility. We specifically explore the use of recent Information Retrieval (IR) term weighting models that are effectively extended by related terms using word embeddings. In parallel to textual information, factual market data have been widely used as the mainstream approach to forecast market risk. We therefore study different fusion methods to combine text and market data resources. Our word embedding-based approach significantly outperforms state-of-the-art methods. In addition, we investigate the characteristics of the reports of the companies in different financial sectors

arXiv.org e-Print Archive

Crossref

International Institute for Applied Systems Analysis (IIASA)