1,622 research outputs found

    Topic extraction in words networks

    Get PDF

    High quality topic extraction from business news explains abnormal financial market volatility

    Get PDF
    Understanding the mutual relationships between information flows and social activity in society today is one of the cornerstones of the social sciences. In financial economics, the key issue in this regard is understanding and quantifying how news of all possible types (geopolitical, environmental, social, financial, economic, etc.) affect trading and the pricing of firms in organized stock markets. In this article, we seek to address this issue by performing an analysis of more than 24 million news records provided by Thompson Reuters and of their relationship with trading activity for 206 major stocks in the S&P US stock index. We show that the whole landscape of news that affect stock price movements can be automatically summarized via simple regularized regressions between trading activity and news information pieces decomposed, with the help of simple topic modeling techniques, into their "thematic" features. Using these methods, we are able to estimate and quantify the impacts of news on trading. We introduce network-based visualization techniques to represent the whole landscape of news information associated with a basket of stocks. The examination of the words that are representative of the topic distributions confirms that our method is able to extract the significant pieces of information influencing the stock market. Our results show that one of the most puzzling stylized fact in financial economies, namely that at certain times trading volumes appear to be "abnormally large," can be partially explained by the flow of news. In this sense, our results prove that there is no "excess trading," when restricting to times when news are genuinely novel and provide relevant financial information.Comment: The previous version of this article included an error. This is a revised versio

    Bipartite graph for topic extraction

    Get PDF
    This article presents a bipartite graph propagation method to be applied to different tasks in the machine learning unsupervised domain, such as topic extraction and clustering. We introduce the objectives and hypothesis that motivate the use of graph based method, and we give the intuition of the proposed Bipartite Graph Propagation Algorithm. The contribution of this study is the development of new method that allows the use of heuristic knowledge to discover topics in textual data easier than it is possible in the traditional mathematical formalism based on Latent Dirichlet Allocation (LDA). Initial experiments demonstrate that our Bipartite Graph Propagation algorithm return good results in a static context (offline algorithm). Now, our research is focusing on big amount of data and dynamic context (online algorithm).São Paulo Research Foundation (FAPESP) (proj. number 2011/23689-9

    Trending topic extraction from social media

    Get PDF
    Social media has become the first source of information for many people. The amount of information posted on social media daily has become very vast that it became difficult to track. One of the most popular social media applications is Twitter. Users follow lots of news accounts, public figures, and their friends so they can be updated by the latest events around them. Since the dialect language and the style of writing differ from a region to another, our objective in this research is to extract trending topics for an Egyptian twitter user. In this way, the user can easily get at a glimpse of the trending topics discussed by the people he follows. To find the best approach achieving our objective, we investigate the document pivot and the feature pivot approaches. By applying the document pivot approach on the baseline data using tf-itf (term frequency-inverse tweet frequency) representation, repeated bisecting k-means clustering technique and extracting most frequent n-grams from each cluster we could achieve a recall value of 100% and F1 measure of 0.8. The application of the feature pivot approach on the baseline data using the content similarity algorithm to group related unigrams together, could achieve a recall value of 100% and F1 measure of 0.923. To validate our results we collected 12 different data sets of different sizes (200, 400, 600, and 1200) and from three different domains (sports, entertainment, and news) then applied both approaches to them. The average recall, precision and F1 measure values resulted from applying the feature pivot approach are larger than those achieved by applying the document pivot approach. To make sure this difference in results is statistically significant we applied the Two-sample one-tailed paired significance t-test that showed the results are significantly better at confidence interval of 90% The results showed that the document pivot approach could extract the trending topics for an Egyptian twitter user with an average recall value of 0.714, average precision value of 0.521, and average F1 measure value of 0.556 versus average recall, precision and F1 measure values of 0.981, 0.754, and 0.833 respectively, when applying the feature pivot approach. â€

    Topic extraction from microblog posts using conversation structures

    Get PDF

    Topic Extraction Analysis for Sidoardjo Mudflow Disaster Impacts

    Get PDF
    In this paper, we present our work on analyzing the impact of the Mudflow Disaster in Sidoardjo, Indonesia, based on text mining technologies. We conducted a topic extraction using the Latent Dirichlet Allocation model. To handle the difficult expressions and grasp the points, we use various techniques such as bigram segmentation for documents related to the Mudflow in English. The TreeTagger is the morphological analysis tool used. The extracted topics clearly showed the impact of the Sidoardjo Mudflow. The most widely discussed topic found was the resettlement conditions and the compensation for the victim corresponding to the presidential regulation. We also found other frequently mentioned topics, such as the payment of resettlement, water pollution, and the verification process for the households

    Topic Extraction and Interactive Knowledge Graphs for Learning Resources

    Get PDF
    Humanity development through education is an important method of sustainable development. This guarantees community development at present time without any negative effects in the future and also provides prosperity for future generations. E-learning is a natural development of the educational tools in this era and current circumstances. Thanks to the rapid development of computer sciences and telecommunication technologies, this has evolved impressively. In spite of facilitating the educational process, this development has also provided a massive amount of learning resources, which makes the task of searching and extracting useful learning resources difficult. Therefore, new tools need to be advanced to facilitate this development. In this paper we present a new algorithm that has the ability to extract the main topics from textual learning resources, link related resources and generate interactive dynamic knowledge graphs. This algorithm accurately and efficiently accomplishes those tasks no matter how big or small the texts are. We used Wikipedia Miner, TextRank, and Gensim within our algorithm. Our algorithm"s accuracy was evaluated against Gensim, largely improving its accuracy. This could be a step towards strengthening self-learning and supporting the sustainable development of communities, and more broadly of humanity, across different generations.The researcher was partially funded by the Egyptian Ministry of Higher Education and Minia University in the Arab Republic of Egypt. [Joint supervision mission from the fourth year missions (2015–2016) of the seventh five-year plan (2012–2017)]

    Enhancing Topic Extraction in Recommender Systems with Entropy Regularization

    Full text link
    In recent years, many recommender systems have utilized textual data for topic extraction to enhance interpretability. However, our findings reveal a noticeable deficiency in the coherence of keywords within topics, resulting in low explainability of the model. This paper introduces a novel approach called entropy regularization to address the issue, leading to more interpretable topics extracted from recommender systems, while ensuring that the performance of the primary task stays competitively strong. The effectiveness of the strategy is validated through experiments on a variation of the probabilistic matrix factorization model that utilizes textual data to extract item embeddings. The experiment results show a significant improvement in topic coherence, which is quantified by cosine similarity on word embeddings
    • …
    corecore