2 research outputs found

    Mining Unstructured Financial News to Forecast Intraday Stock Price Movements

    Full text link
    In this thesis, we develop a system that analyzes unstructured financial news using text classification in order to forecast stock price trends. We review similar systems to build on successful ideas and combine them with novel approaches. We discuss the different types of news that are potentially relevant to the stock prices and choose news sources for the system accordingly. To eliminate irrelevant news, we present suitable filtering approaches such as the implementation of a rule-based thesaurus. We develop an automatic labeling approach and compare it to a manual labeling approach. We evaluate the influence of different automatic labeling approaches on the prediction performance. In a data training phase, we introduce a set of features novel with respect to the price forecasting task. We compare different text mining techniques such as the feature vector dimensionality reduction and different classifiers. Finally, we investigate the influence of trading costs on potential profits and run a market simulation that is able to support or reject the practical profitability of the system

    Towards Real Time Discovery from Distributed Information Sources

    No full text
    Many successful knowledge discovery or data mining techniques and systems have been developed. These techniques usually apply to centralized databases with less restricted requirements on learning and response time. Not so much effort yet has been put into mining of distributed databases and real-time issues. In this paper, we investigate issues of fast distributed data mining. We assume that merging the distributed database into a single one would be either be too costly (distributed case); or, the individual fragments are non-uniform so that mining only one fragment would bias the result (fragmented case). The goal is to classify objects O of the database into one of several mutually exclusive classes C i . Our approach to make mining fast and feasible is as follows. From each data site or fragment db k , only a single rule r ik is generated for each category C . A small subset {r i1 , ...,r ih } of these individual rules is selected to form a rule set R for each category C i . Thes..
    corecore