27 research outputs found
On the distribution of time-to-proof of mathematical conjectures
What is the productivity of Science? Can we measure an evolution of the
production of mathematicians over history? Can we predict the waiting time till
the proof of a challenging conjecture such as the P-versus-NP problem?
Motivated by these questions, we revisit a suggestion published recently and
debated in the "New Scientist" that the historical distribution of
time-to-proof's, i.e., of waiting times between formulation of a mathematical
conjecture and its proof, can be quantified and gives meaningful insights in
the future development of still open conjectures. We find however evidence that
the mathematical process of creation is too much non-stationary, with too
little data and constraints, to allow for a meaningful conclusion. In
particular, the approximate unsteady exponential growth of human population,
and arguably that of mathematicians, essentially hides the true distribution.
Another issue is the incompleteness of the dataset available. In conclusion we
cannot really reject the simplest model of an exponential rate of conjecture
proof with a rate of 0.01/year for the dataset that we have studied,
translating into an average waiting time to proof of 100 years. We hope that
the presented methodology, combining the mathematics of recurrent processes,
linking proved and still open conjectures, with different empirical
constraints, will be useful for other similar investigations probing the
productivity associated with mankind growth and creativity.Comment: 10 pages + 6 figure
High quality topic extraction from business news explains abnormal financial market volatility
Understanding the mutual relationships between information flows and social
activity in society today is one of the cornerstones of the social sciences. In
financial economics, the key issue in this regard is understanding and
quantifying how news of all possible types (geopolitical, environmental,
social, financial, economic, etc.) affect trading and the pricing of firms in
organized stock markets. In this article, we seek to address this issue by
performing an analysis of more than 24 million news records provided by
Thompson Reuters and of their relationship with trading activity for 206 major
stocks in the S&P US stock index. We show that the whole landscape of news that
affect stock price movements can be automatically summarized via simple
regularized regressions between trading activity and news information pieces
decomposed, with the help of simple topic modeling techniques, into their
"thematic" features. Using these methods, we are able to estimate and quantify
the impacts of news on trading. We introduce network-based visualization
techniques to represent the whole landscape of news information associated with
a basket of stocks. The examination of the words that are representative of the
topic distributions confirms that our method is able to extract the significant
pieces of information influencing the stock market. Our results show that one
of the most puzzling stylized fact in financial economies, namely that at
certain times trading volumes appear to be "abnormally large," can be partially
explained by the flow of news. In this sense, our results prove that there is
no "excess trading," when restricting to times when news are genuinely novel
and provide relevant financial information.Comment: The previous version of this article included an error. This is a
revised versio
Prediction of ESG Compliance using a Heterogeneous Information Network
Negative screening is one method to avoid interactions with inappropriate
entities. For example, financial institutions keep investment exclusion lists
of inappropriate firms that have environmental, social, and government (ESG)
problems. They create their investment exclusion lists by gathering information
from various news sources to keep their portfolios profitable as well as green.
International organizations also maintain smart sanctions lists that are used
to prohibit trade with entities that are involved in illegal activities. In the
present paper, we focus on the prediction of investment exclusion lists in the
finance domain. We construct a vast heterogeneous information network that
covers the necessary information surrounding each firm, which is assembled
using seven professionally curated datasets and two open datasets, which
results in approximately 50 million nodes and 400 million edges in total.
Exploiting these vast datasets and motivated by how professional investigators
and journalists undertake their daily investigations, we propose a model that
can learn to predict firms that are more likely to be added to an investment
exclusion list in the near future. Our approach is tested using the negative
news investment exclusion list data of more than 35,000 firms worldwide from
January 2012 to May 2018. Comparing with the state-of-the-art methods with and
without using the network, we show that the predictive accuracy is
substantially improved when using the vast information stored in the
heterogeneous information network. This work suggests new ways to consolidate
the diffuse information contained in big data to monitor dominant firms on a
global scale for better risk management and more socially responsible
investment
Predicted and Verified Deviations from Zipf's law in Ecology of Competing Products
Zipf's power-law distribution is a generic empirical statistical regularity
found in many complex systems. However, rather than universality with a single
power-law exponent (equal to 1 for Zipf's law), there are many reported
deviations that remain unexplained. A recently developed theory finds that the
interplay between (i) one of the most universal ingredients, namely stochastic
proportional growth, and (ii) birth and death processes, leads to a generic
power-law distribution with an exponent that depends on the characteristics of
each ingredient. Here, we report the first complete empirical test of the
theory and its application, based on the empirical analysis of the dynamics of
market shares in the product market. We estimate directly the average growth
rate of market shares and its standard deviation, the birth rates and the
"death" (hazard) rate of products. We find that temporal variations and product
differences of the observed power-law exponents can be fully captured by the
theory with no adjustable parameters. Our results can be generalized to many
systems for which the statistical properties revealed by power law exponents
are directly linked to the underlying generating mechanism
Sales Distribution of Consumer Electronics
Using the uniform most powerful unbiased test, we observed the sales
distribution of consumer electronics in Japan on a daily basis and report that
it follows both a lognormal distribution and a power-law distribution and
depends on the state of the market. We show that these switches occur quite
often. The underlying sales dynamics found between both periods nicely matched
a multiplicative process. However, even though the multiplicative term in the
process displays a size-dependent relationship when a steady lognormal
distribution holds, it shows a size-independent relationship when the power-law
distribution holds. This difference in the underlying dynamics is responsible
for the difference in the two observed distributions
Predicting Adverse Media Risk using a Heterogeneous Information Network
The media plays a central role in monitoring powerful institutions and identifying any activities harmful to the public interest. In the investing sphere constituted of 46,583 officially listed domestic firms on the stock exchanges worldwide, there is a growing interest “to do the right thing”, i.e., to put pressure on companies to improve their environmental, social and government (ESG) practices. However, how to overcome the sparsity of ESG data from non-reporting firms, and how to identify the relevant information in the annual reports of this large universe? Here, we construct a vast heterogeneous information network that covers the necessary information surrounding each firm, which is assembled using seven professionally curated datasets and two open datasets, resulting in about 50 million nodes and 400 million edges in total. Exploiting this heterogeneous information network, we propose a model that can learn from past adverse media coverage patterns and predict the occurrence of future adverse media coverage events on the whole universe of firms. Our approach is tested using the adverse media coverage data of more than 35,000 firms worldwide from January 2012 to May 2018. Comparing with state-of-the-art methods with and without the network, we show that the predictive accuracy is substantially improved when using the heterogeneous information network. This work suggests new ways to consolidate the diffuse information contained in big data in order to monitor dominant institutions on a global scale for more socially responsible investment, better risk management, and the surveillance of powerful institutions.Publisher\u27s another name: JSPS Grants-in-Aid for Scientific Research (S) Central Bank Communication Desig