Search CORE

18 research outputs found

Temporal and Cross Correlations in Business News

Author: Mizuno Takayuki
Ohnishi Takaaki
Takei Kazumasa
Watanabe Tsutomu
Publication venue
Publication date
Field of study

We empirically investigated temporal and cross correlations in the frequency of news reports on companies using a unique dataset with more than 100 million news articles reported in English by around 500 press agencies worldwide for the period 2003-2009. Our main findings are as follows. First, the frequency of news reports on a company does not follow a Poisson process; instead, it is characterized by long memory with a positive autocorrelation for more than a year. Second, there exist significant correlations in the frequency of news across companies. Specifically, on a daily or longer time scale, the frequency of news is governed by external dynamics such as an increase in the number of news due to, for example, the outbreak of an economic crisis, while it is governed by internal dynamics on a time scale of minutes. These two findings indicate that the frequency of news on companies has similar statistical properties as trading activities, measured by trading volumes or price volatility, in stock markets, suggesting that the flow of information through news on companies plays an important role in price dynamics in stock markets.

Research Papers in Economics

Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Author: Funk Burkhardt
Junginger Christian
Lommel Lasse
Riebling Meike
Publication venue: AIS Electronic Library (AISeL)
Publication date: 28/02/2019
Field of study

Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon

AIS Electronic Library (AISeL)

Tuning Latent Dirichlet Allocation Parameters using Ant Colony Optimization

Author: Kanarkard Wanida
Yarnguy Thanakorn
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 19/02/2018
Field of study

Latent Dirichlet Allocation is a famous and commonly used model used to find hidden topic and apply in many text analysis research. To improve the performance of LDA, two Dirichlet prior parameters, namely the α and the β, that has an effect on the performance of the system are utilized. Accordingly, they must be set to an appropriate value. Ant colony optimization has the ability to solve the computational problem by adding parameters tuning. Thus, we proposed to implement an approach to find the optimal parameters α and β for LDA by using Ant colony optimization. An evaluation using dataset from the UCI (KOS, NIPS, ENRON) that are the standards for estimating topic model was conducted. The results of the experiment show that LDA, which has tuning parameters by ACO has better performance when it is evaluated by perplexity score

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Modelado y difusión de temas noticiosos en medios sociales: características y factores de la emergencia de noticias en un canal informativo de Twitter

Author: Aguaded I.
Calderón C.A.
Caro E.B.
Publication venue: Universidad de Guadalajara
Publication date: 26/05/2020
Field of study

This study aims to characterize the modeling and diffusion of news topics in social media and determine the factors that influenced them. With Big Data analysis methods, such as topic modeling and sentiment analysis, we analyzed one year of tweets from Colombian newspaper El Tiempo. We found that the appearance of long-term topics was related to the message's attributes. Theoretical implications and contributions considering Diffusion of Innovations' model are mentioned. © 2018 Universidad de Guadalajara. All rights reserved

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Modeling Statistical Properties of Written Text

Author: A Clauset
A Saichev
A Sarkar
A-L Barabási
AK Joshi
Alessandro Flammini
B Liu
C Cattuto
C Elkan
C Manning
D de Solla Price
D Newman
DM Blei
E Alvarez-Lacalle
Enrico Scalas
F Menczer
F Menczer
Filippo Menczer
G Salton
GK Zipf
H Chen
HA Simon
HS Heaps
J Allan
J Kleinberg
J Kleinberg
J Pennebaker
JL Dolby
JS Adelman
K-I Goh
KW Church
M Jansche
M Porter
M. Ángeles Serrano
MA Nowak
MD Hauser
N Chomsky
QD Atkinson
R Albert
R Baeza-Yates
R Feldman
R Madsen
RH Baayen
S Chakrabarti
S Fortunato
SM Katz
T Griffiths
T Hofmann
TL Griffiths
VP Maslov
W Li
WS Murray
Y Yang
Publication venue: Public Library of Science
Publication date: 29/04/2009
Field of study

Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Diposit Digital de la Universitat de Barcelona

Recommended from our members

Measuring the Interestingness of News Articles

Author: Buttler D J
Cardenas A F
Pon R K
Publication venue: Lawrence Livermore National Laboratory
Publication date: 24/09/2007
Field of study

An explosive growth of online news has taken place. Users are inundated with thousands of news articles, only some of which are interesting. A system to filter out uninteresting articles would aid users that need to read and analyze many articles daily, such as financial analysts and government officials. The most obvious approach for reducing the amount of information overload is to learn keywords of interest for a user (Carreira et al., 2004). Although filtering articles based on keywords removes many irrelevant articles, there are still many uninteresting articles that are highly relevant to keyword searches. A relevant article may not be interesting for various reasons, such as the article's age or if it discusses an event that the user has already read about in other articles. Although it has been shown that collaborative filtering can aid in personalized recommendation systems (Wang et al., 2006), a large number of users is needed. In a limited user environment, such as a small group of analysts monitoring news events, collaborative filtering would be ineffective. The definition of what makes an article interesting--or its 'interestingness'--varies from user to user and is continually evolving, calling for adaptable user personalization. Furthermore, due to the nature of news, most articles are uninteresting since many are similar or report events outside the scope of an individual's concerns. There has been much work in news recommendation systems, but none have yet addressed the question of what makes an article interesting

UNT Digital Library

Software Newsroom – an approach to automation of news search and editing

Author: Gross Oskar
Huovelin Juhani
Linden Krister
Maisala Sami Petri Tapio
Niemi Jyrki
Oittinen Tero
Silfverberg Miikka
Solin Otto
Toivonen Hannu
Publication venue
Publication date: 07/11/2013
Field of study

We have developed tools and applied methods for automated identification of potential news from textual data for an automated news search system called Software Newsroom. The purpose of the tools is to analyze data collected from the internet and to identify information that has a high probability of containing new information. The identified information is summarized in order to help understanding the semantic contents of the data, and to assist the news editing process. It has been demonstrated that words with a certain set of syntactic and semantic properties are effective when building topic models for English. We demonstrate that words with the same properties in Finnish are useful as well. Extracting such words requires knowledge about the special characteristics of the Finnish language, which are taken into account in our analysis. Two different methodological approaches have been applied for the news search. One of the methods is based on topic analysis and it applies Multinomial Principal Component Analysis (MPCA) for topic model creation and data profiling. The second method is based on word association analysis and applies the log-likelihood ratio (LLR). For the topic mining, we have created English and Finnish language corpora from Wikipedia and Finnish corpora from several Finnish news archives and we have used bag-of-words presentations of these corpora as training data for the topic model. We have performed topic analysis experiments with both the training data itself and with arbitrary text parsed from internet sources. The results suggest that the effectiveness of news search strongly depends on the quality of the training data and its linguistic analysis. In the association analysis, we use a combined methodology for detecting novel word associations in the text. For detecting novel associations we use the background corpus from which we extract common word associations. In parallel, we collect the statistics of word co-occurrences from the documents of interest and search for associations with larger likelihood in these documents than in the background. We have demonstrated the applicability of these methods for Software Newsroom. The results indicate that the background-foreground model has significant potential in news search. The experiments also indicate great promise in employing background-foreground word associations for other applications. A combined application of the two methods is planned as well as the application of the methods on social media using a pre-translator of social media language.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

The Resonance and Residue of the First African American Newspaper: How Freedom\u27s Journal Created Space in the Early 19th Century

Author: Kasper Valerie
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2018
Field of study

The first African American newspaper, Freedom\u27s Journal, has a historical, rhetorical, and spatial purpose. It not only showed the impact made by African Americans in the fight for their civil rights in the early 19th century, but as an artifact it illustrated and preserved that history allowing it to be studied centuries after the newspaper ceased printing. The purpose of The Resonance and Residue of the First African American Newspaper: How Freedom\u27s Journal Created Space in the Early 19th Century is to provide an interdisciplinary approach to historical newspapers that illustrates an alternative history in this country — a history of and by African Americans. By combining both print and digital research methods, new historical, rhetorical, and spatial information can be discovered that illustrates how the first African American newspaper fought against the influences of white society in the early 19th century and created a space for the black community that became meaningful enough to transform America into a place in which African Americans identified as Americans. Therefore, the purpose of this research is to combine traditional research and close reading with digital analysis (machine reading) by using different digital tools to illustrate how Freedom\u27s Journal used text to combat the influences/powers that were shaping the early 19th century, and create a new and different type of historical narrative about how one oppressed community was successfully able to fight another dominant community through the use of text

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)