4 research outputs found

    An Analysis of Speculative Language in SEC 10-K Filings

    Get PDF
    This study applies sentiment analysis techniques to model the usage of speculation within a collection of financial documents. The model is trained on the MPQA corpus to extract features that correlate with speculative sentences and applied to a collection of SEC 10-K documents from a five year period. The documents with the highest amount of speculation contained a different concentration of terms compared to the entire collection, and the sentences mostly consisted of explaining potential risks concerning projects, taxes, and pensions.Master of Science in Information Scienc

    Text Mining for Big Data Analysis in Financial Sector: A Literature Review

    Get PDF
    Big data technologies have a strong impact on different industries, starting from the last decade, which continues nowadays, with the tendency to become omnipresent. The financial sector, as most of the other sectors, concentrated their operating activities mostly on structured data investigation. However, with the support of big data technologies, information stored in diverse sources of semi-structured and unstructured data could be harvested. Recent research and practice indicate that such information can be interesting for the decision-making process. Questions about how and to what extent research on data mining in the financial sector has developed and which tools are used for these purposes remains largely unexplored. This study aims to answer three research questions: (i) What is the intellectual core of the field? (ii) Which techniques are used in the financial sector for textual mining, especially in the era of the Internet, big data, and social media? (iii) Which data sources are the most often used for text mining in the financial sector, and for which purposes? In order to answer these questions, a qualitative analysis of literature is carried out using a systematic literature review, citation and co-citation analysis

    In Search of Meaning:Lessons, Resources and Next Steps for Computational Analysis of Financial Discourse

    Get PDF
    We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse. We also review common themes and innovations in the literature and assess the incremental contributions of work applying CL methods over manual content analysis. Key conclusions emerging from our analysis are: (a) accounting and finance research is behind the curve in terms of CL methods generally and word sense disambiguation in particular; (b) implementation issues mean the proposed benefits of CL are often less pronounced than proponents suggest; (c) structural issues limit practical relevance; and (d) CL methods and high quality manual analysis represent complementary approaches to analyzing financial discourse. We describe four CL tools that have yet to gain traction in mainstream AF research but which we believe offer promising ways to enhance the study of meaning in financial discourse. The four approaches are named entity recognition, summarization, semantics and corpus linguistics

    Extracting business performance signals from Twitter news

    Get PDF
    Social media and social networks underpin a revolution in communication between people, with the particular feature that much of that communication is open to all. This provides a massive pool of data that can be exploited by researchers for a wide variety of different applications. Data from Twitter is of particular interest in this sense, given its large global usage levels, and the availability of APIs and other tools that enable easy access to the publicly available stream of tweets. Owing to the wide public penetration of Twitter, many businesses make use of it to share their latest news, effectively using Twitter as a gateway to connect to end-users, consumers and/or investors. In this thesis, we focus on the potential for extracting information from Twitter that is relevant to the financial and competitiveness status of a business. We consider a collection of well-regarded Twitter accounts that are known for communicating recent business news, and we investigate the automated analysis of the stream of tweets from these sources, with a view to learning business-relevant information about specific companies. A key aspect of our approach is the idea of extracting specific areas of business performance: we explore three such areas: productivity, competitiveness, and industrial risk. We propose a two-step model which first classifies a tweet into one of these areas, and then assigns a sentiment value (on a positive/negative scale). The resulting sentiment values across specific aspects represent novel business indicators that could add significant value to the toolset used by business analysts. Our experiments are based on a new manually pre-classified data set (available from a URL provided). Additionally, we propose n-grams made from non-contiguous words as a novel feature to enhance performance in this context. Experiments involving a range of feature selection methods show that these new features provide valuable benefits in comparison with standard n-gram features. We also interduce the concept of an extra layer added to the primary classifier, with the role of filtering out noisy tweets before they enter the system. We use a One-Class SVM for this purpose. Broadly, we show that the methods developed in this thesis achieve promising results in both topic and sentiment classification in the business performance context, suggesting that twitter can indeed be a useful source of signals related to different aspects of business performance. We also find that our system can provide valuable insight into unseen test data. However, more research is needed to be able to extract robust signals for industrial risk, and there seems to be a considerable promise for further development
    corecore