18 research outputs found

    Exploring the use of paragraph-level annotations for sentiment analysis of financial blogs

    Get PDF
    In this paper we describe our work in the area of topic-based sentiment analysis in the domain of financial blogs. We explore the use of paragraph-level and document-level annotations, examining how additional information from paragraph-level annotations can be used to increase the accuracy of document-level sentiment classification. We acknowledge the additional effort required to provide these paragraph-level annotations, and so we compare these findings against an automatic means of generating topic-specific sub-documents

    Topic-dependent sentiment analysis of financial blogs

    Get PDF
    While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches

    Overview of quantitative news interpretation methods applied in financial market predictions

    Get PDF
    This paper describes currently known methods of quantitative news interpretation applied in financial market predictions. Brief summaries are made regarding all the listed methods of automatic news interpretation, some commercial applications are mentioned and finally a conclusion is drawn about the usability and prospects of quantitative news analysis with statistical machine learning methods. The aim of this paper is to provide an overview on the related research activities performed so far and explore further research directions to improve the predictive capability of currently known methods

    Engineering social media driven intelligent systems through crowdsourcing: Insights from a financial news summarisation system

    Get PDF
    Purpose The purpose of this paper is to explore implicit crowdsourcing, leveraging social media in real-time scenarios for intelligent systems. Design/methodology/approach A case study using an illustrative example system, which systematically employed a custom social media platform for automated financial news analysis and summarisation was developed, evaluated and discussed. Literature review related to crowdsourcing and collective intelligence in intelligent systems was also conducted to provide context and to further explore the case study. Findings It was shown how, and that useful intelligent systems can be constructed from appropriately engineered custom social media platforms which are integrated with intelligent automated processes. A recent inter-rater agreement measure for evaluating quality of implicit crowd contributions was also explored and found to be of value. Practical implications This paper argues that when social media platforms are closely integrated with other automated processes into a single system, this may provide a highly worthwhile online and real-time approach to intelligent systems through implicit crowdsourcing. Key practical issues, such as achieving high quality crowd contributions, challenges of efficient workflows and real-time crowd integration into intelligent systems were discussed. Important ethical and related considerations were also covered. Originality/value A contribution to existing theory was made by proposing how social media web platforms may benefit crowdsourcing. As opposed to traditional crowdsourcing platforms, the presented approach and example system has a set of social elements that encourages implicit crowdsourcing. Instances of crowdsourcing with existing social media, such as Twitter, often also called crowd piggybacking have been used in the past; however, employing an entirely custom-built social media system for implicit crowdsourcing is relatively novel and has several advantages. Some of the discussion in context of intelligent systems construction are novel and contribute to the existing body of literature in this field

    Predicting risk from financial reports with regression

    Full text link

    Using collaborative tagging for text classification: from text classification to opinion mining

    Get PDF
    Numerous initiatives have allowed users to share knowledge or opinions using collaborative platforms. In most cases, the users provide a textual description of their knowledge, following very limited or no constraints. Here, we tackle the classification of documents written in such an environment. As a use case, our study is made in the context of text mining evaluation campaign material, related to the classification of cooking recipes tagged by users from a collaborative website. This context makes some of the corpus specificities difficult to model for machine-learning-based systems and keyword or lexical-based systems. In particular, different authors might have different opinions on how to classify a given document. The systems presented hereafter were submitted to the DĀ“Efi Fouille de Textes 2013 evaluation campaign, where they obtained the best overall results, ranking first on task 1 and second on task 2. In this paper, we explain our approach for building relevant and effective systems dealing with such a corpus

    Comparing hierarchical approaches to enhance supervised emotive text classification

    Get PDF
    The performance of emotive text classification using affective hierarchical schemes (e.g. WordNet-Affect) is often evaluated using the same traditional measures used to evaluate the performance of when a finite set of isolated classes are used. However, applying such measures means the full characteristics and structure of the emotive hierarchical scheme are not considered. Thus, the overall performance of emotive text classification using emotion hierarchical schemes is often inaccurately reported and may lead to ineffective information retrieval and decision making. This paper provides a comparative investigation into how methods used in hierarchical classification problems in other domains, which extend traditional evaluation metrics to consider the characteristics of the hierarchical classification scheme can be applied and subsequently improve the classification of emotive texts. This study investigates the classification performance of three widely used classifiers, Naive Bayes, J48 Decision Tree, and SVM, following the application of the aforementioned methods. The results demonstrated that all methods improved the emotion classification. However, the most notable improvement was recorded when a depth-based method was applied to both the testing and validation data, where the precision, recall, and F1-score were significantly improved by around 70 percentage points for each classifier

    Applying text timing in corporate spin-off disclosure statement analysis: understanding the main concerns and recommendation of appropriate term weights

    Get PDF
    Text mining helps in extracting knowledge and useful information from unstructured data. It detects and extracts information from mountains of documents and allowing in selecting data related to a particular data. In this study, text mining is applied to the 10-12b filings done by the companies during Corporate Spin-off. The main purposes are (1) To investigate potential and/or major concerns found from these financial statements filed for corporate spin-off and (2) To identify appropriate methods in text mining which can be used to reveal these major concerns. 10-12b filings from thirty-four companies were taken and only the Risk Factors category was taken for analysis. Term weights such as Entropy, IDF, GF-IDF, Normal and None were applied on the input data and out of them Entropy and GF-IDF were found to be the appropriate term weights which provided acceptable results. These accepted term weights gave the results which was acceptable to human expert\u27s expectations. The document distribution from these term weights created a pattern which reflected the mood or focus of the input documents. In addition to the analysis, this study also provides a pilot study for future work in predictive text mining for the analysis of similar financial documents. For example, the descriptive terms found from this study provide a set of start word list which eliminates the try and error method of framing an initial start list --Abstract, page iii
    corecore