97 research outputs found

    Augmented trading:From news articles to stock price predictions using semantics

    Get PDF
    This thesis tries to answer the question how to predict the reaction of the stock market to news articles using the latest suitable developments in Natural Language Processing. This is done using text classiffication where a new article is matched to a category of articles which have a certain influence on the stock price. The thesis first discusses why analysis of news articles is a feasible approach to predicting the stock market and why analysis of past prices should not be build upon. From related work in this domain two main design choices are extracted; what to take as features for news articles and how to couple them with the changes in stock price. This thesis then suggests which different features are possible to extract from articles resulting in a template for features which can deal with negation, favorability, abstracts from companies and uses domain knowledge and synonyms for generalization. To couple the features to changes in stock price a survey is given of several text classiffication techniques from which it is concluded that Support Vector Machines are very suitable for the domain of stock prices and extensive features. The system has been tested with a unique data set of news articles for which results are reported that are signifficantly better than random. The results improve even more when only headlines of news articles are taken into account. Because the system is only tested with closing prices it cannot concluded that it will work in practice but this can be easily tested if stock prices during the days are available. The main suggestions for feature work are to test the system with this data and to improve the filling of the template so it can also be used in other areas of favorability analysis or maybe even to extract interesting information out of texts

    Constructing event evolution graphs from large news corpora.

    Get PDF
    Shi Xiaodong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 114-117).Abstracts in English and Chinese.摘芁 --- p.iAbstract --- p.iiAcknowledgement --- p.iiiTable of Contents --- p.ivList of Tables --- p.viList of Figures --- p.viiList of Figures --- p.viiChapter Chapter 1. --- Introduction --- p.9Chapter 1.1. --- Background --- p.9Chapter 1.2. --- Research Motivation --- p.10Chapter 1.3. --- Research Objective --- p.14Chapter 1.4. --- Organization of Thesis --- p.14Chapter Chapter 2. --- Problem Analysis and Definition --- p.15Chapter 2.1. --- "Definitions of Story, Event and Topic" --- p.15Chapter 2.2. --- Characteristics of News Stories --- p.17Chapter Chapter 3. --- Literature Review --- p.26Chapter 3.1. --- Topic Detection and Tracking (TDT) --- p.26Chapter 3.2. --- Document Clustering Techniques --- p.27Chapter 3.3. --- Event Evolution --- p.30Chapter Chapter 4. --- System Architecture --- p.34Chapter Chapter 5. --- Event Evolution --- p.37Chapter 5.1. --- Event Evolution --- p.37Chapter 5.2. --- Event Timestamp and Temporal Relationship --- p.39Chapter 5.3. --- Event Evolution Graph --- p.42Chapter 5.4. --- Event Threading and Event Joining --- p.46Chapter Chapter 6. --- Extracting News Events --- p.48Chapter 6.1. --- Clustering Approach --- p.48Chapter 6.2. --- Utilizing Clustered Stories from News Infomediaries --- p.59Chapter Chapter 7. --- Modeling Event Evolution Relationships --- p.62Chapter 7.1. --- Measuring the Confidences of Event Evolution Relationships --- p.63Chapter Chapter 8. --- Constructing Event Evolution Graphs --- p.68Chapter 8.1. --- Static Thresholding --- p.68Chapter 8.2. --- Static Pruning --- p.69Chapter 8.3. --- Dynamic Pruning --- p.70Chapter Chapter 9. --- Experimental Evaluation --- p.72Chapter 9.1. --- Evaluation Measure --- p.72Chapter 9.2. --- Data Set --- p.77Chapter 9.3. --- Experimental Results and Analysis --- p.78Chapter Chapter 10. --- Case Study --- p.89Chapter Chapter 11. --- Story Segmentation and Its Effects --- p.93Chapter 11.1. --- Story Segmentation --- p.95Chapter 11.2. --- Event Generalization --- p.97Chapter 11.3. --- Experimental Evaluation --- p.98Chapter Chapter 12. --- Conclusions and Future Work --- p.112Chapter 12.1. --- Conclusions --- p.112Chapter 12.2. --- Future Work --- p.113References --- p.11

    Finding groups of people in Google news

    Get PDF
    In this paper, we study the problem of content-based social network discovery among people who frequently appear in world news. Google news is used as the source of data. We describe a probabilistic framework for associating people with groups. A low-dimensional topic-based representation is first obtained for news stories via probabilistic latent semantic analysis (PLSA). This is followed by construction of semantic groups by clustering such representations. Unlike many existing social network analysis approaches, which discover groups based only on binary relations (e.g. co-occurrence of people in a news article), our model clusters people using their topic distribution, which introduces contextual information in the group formation process (e.g. some people belong to several groups depending on the specific subject). The model has been used to study evolution of people with respect to topics over time. We also illustrate the advantages of our approach over a simple co-occurrence-based social network extraction method

    Modeling Anticipatory Event Transitions

    Get PDF

    Paving the way for next generation data-stream clustering: towards a unique and statistically valid cluster structure at any time step

    Get PDF
    International audienceIn the domain of data-stream clustering, e.g., dynamic text mining as our application domain, our goal is two-fold and a long term one: 1 at each data input, the resulting cluster structure has to be unique, independent of the order the input vectors are presented 2 this structure has to be meaningful for an expert, e.g., not composed of a huge 'catch-all' cluster in a cloud of tiny specific ones, as is often the case with large sparse data tables. The first preliminary condition is satisfied by our Germen density-mode seeking algorithm, but the relevance of the clusters vis-Ă -vis expert judgment relies on the definition of a data density, relying itself on the type of graph chosen for embedding the similarities between text inputs. Having already demonstrated the dynamic behaviour of Germen algorithm, we focus here on appending a Monte-Carlo method for extracting statistically valid inter-text links, which looks promising applied both to an excerpt of the Pascal bibliographic database, and to the Reuters-RCV1 news test collection. Though not being a central issue here, the time complexity of our algorithms is eventually discussed

    Multiple perspectives on innovation: insights from corporate communications, innovation policy and the disclosure of R&D

    Get PDF
    Cumulative Doctoral Thesis compromising eight different papers in the following fields: Corporate Communications, Supply Chain Management, Lexical Evolution, Merger Review, Corporate Entrepreneurship, Solar industry and Organizational Innovativeness

    Multidimensional opinion mining from social data

    Get PDF
    Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This thesis focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm, and irony, from user-generated content represented across multiple social media platforms and in various media formats, like textual, visual, and audio. Mining people’s social opinions from social sources, such as social media platforms and newswires commenting sections, is a valuable business asset that can be utilised in many ways and in multiple domains, such as Politics, Finance, and Government. The main objective of this research is to investigate how a multidimensional approach to Social Opinion Mining affects fine-grained opinion search and summarisation at an aspect-based level and whether such a multidimensional approach outperforms single dimension approaches in the context of an extrinsic human evaluation conducted in a real-world context: the Malta Government Budget, where five social opinion dimensions are taken into consideration, namely subjectivity, sentiment polarity, emotion, irony, and sarcasm. This human evaluation determines whether the multidimensional opinion summarisation results provide added-value to potential end-users, such as policy-makers and decision-takers, thereby providing a nuanced voice to the general public on their social opinions on topics of a national importance. Results obtained indicate that a more fine-grained aspect-based opinion summary based on the combined dimensions of subjectivity, sentiment polarity, emotion, and sarcasm or irony is more informative and more useful than one based on sentiment polarity only. This research contributes towards the advancement of intelligent search and information retrieval from social data and impacts entities utilising Social Opinion Mining results towards effective policy formulation, policy-making, decision-making, and decision-taking at a strategic level
    • 

    corecore