29 research outputs found
Dynamics on expanding spaces: modeling the emergence of novelties
Novelties are part of our daily lives. We constantly adopt new technologies,
conceive new ideas, meet new people, experiment with new situations.
Occasionally, we as individuals, in a complicated cognitive and sometimes
fortuitous process, come up with something that is not only new to us, but to
our entire society so that what is a personal novelty can turn into an
innovation at a global level. Innovations occur throughout social, biological
and technological systems and, though we perceive them as a very natural
ingredient of our human experience, little is known about the processes
determining their emergence. Still the statistical occurrence of innovations
shows striking regularities that represent a starting point to get a deeper
insight in the whole phenomenology. This paper represents a small step in that
direction, focusing on reviewing the scientific attempts to effectively model
the emergence of the new and its regularities, with an emphasis on more recent
contributions: from the plain Simon's model tracing back to the 1950s, to the
newest model of Polya's urn with triggering of one novelty by another. What
seems to be key in the successful modelling schemes proposed so far is the idea
of looking at evolution as a path in a complex space, physical, conceptual,
biological, technological, whose structure and topology get continuously
reshaped and expanded by the occurrence of the new. Mathematically it is very
interesting to look at the consequences of the interplay between the "actual"
and the "possible" and this is the aim of this short review.Comment: 25 pages, 10 figure
On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models
A finite-context (Markov) model of order yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth . Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character
A frequentist framework of inductive reasoning
Reacting against the limitation of statistics to decision procedures, R. A.
Fisher proposed for inductive reasoning the use of the fiducial distribution, a
parameter-space distribution of epistemological probability transferred
directly from limiting relative frequencies rather than computed according to
the Bayes update rule. The proposal is developed as follows using the
confidence measure of a scalar parameter of interest. (With the restriction to
one-dimensional parameter space, a confidence measure is essentially a fiducial
probability distribution free of complications involving ancillary statistics.)
A betting game establishes a sense in which confidence measures are the only
reliable inferential probability distributions. The equality between the
probabilities encoded in a confidence measure and the coverage rates of the
corresponding confidence intervals ensures that the measure's rule for
assigning confidence levels to hypotheses is uniquely minimax in the game.
Although a confidence measure can be computed without any prior distribution,
previous knowledge can be incorporated into confidence-based reasoning. To
adjust a p-value or confidence interval for prior information, the confidence
measure from the observed data can be combined with one or more independent
confidence measures representing previous agent opinion. (The former confidence
measure may correspond to a posterior distribution with frequentist matching of
coverage probabilities.) The representation of subjective knowledge in terms of
confidence measures rather than prior probability distributions preserves
approximate frequentist validity.Comment: major revisio
The Fake News Vaccine - A Content-Agnostic System for Preventing Fake News from Becoming Viral.
International audienceWhile spreading fake news is an old phenomenon, today social media enables misinformation to instantaneously reach millions of people. Content-based approaches to detect fake news, typically based on automatic text checking, are limited. It is indeed difficult to come up with general checking criteria. Moreover, once the criteria are known to an adversary, the checking can be easily bypassed. On the other hand, it is practically impossible for humans to check every news item, let alone preventing them from becoming viral.We present Credulix, the first content-agnostic system to prevent fake news from going viral. Credulix is implemented as a plugin on top of a social media platform and acts as a vaccine. Human fact-checkers review a small number of popular news items, which helps us estimate the inclination of each user to share fake news. Using the resulting information, we automatically estimate the probability that an unchecked news item is fake. We use a Bayesian approach that resembles Condorcetâs Theorem to compute this probability. We show how this computation can be performed in an incremental, and hence fast manner