79,112 research outputs found
A Beta-splitting model for evolutionary trees
In this article, we construct a generalization of the Blum-Fran\c{c}ois
Beta-splitting model for evolutionary trees, which was itself inspired by
Aldous' Beta-splitting model on cladograms. The novelty of our approach allows
for asymmetric shares of diversification rates (or diversification `potential')
between two sister species in an evolutionarily interpretable manner, as well
as the addition of extinction to the model in a natural way. We describe the
incremental evolutionary construction of a tree with n leaves by splitting or
freezing extant lineages through the Generating, Organizing and Deleting
processes. We then give the probability of any (binary rooted) tree under this
model with no extinction, at several resolutions: ranked planar trees giving
asymmetric roles to the first and second offspring species of a given species
and keeping track of the order of the speciation events occurring during the
creation of the tree, unranked planar trees, ranked non-planar trees and
finally (unranked non-planar) trees. We also describe a continuous-time
equivalent of the Generating, Organizing and Deleting processes where tree
topology and branch-lengths are jointly modeled and provide code in
SageMath/python for these algorithms.Comment: 23 pages, 3 figures, 1 tabl
Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)
This paper introduces a scalable approach for probabilistic top-k similarity
ranking on uncertain vector data. Each uncertain object is represented by a set
of vector instances that are assumed to be mutually-exclusive. The objective is
to rank the uncertain data according to their distance to a reference object.
We propose a framework that incrementally computes for each object instance and
ranking position, the probability of the object falling at that ranking
position. The resulting rank probability distribution can serve as input for
several state-of-the-art probabilistic ranking models. Existing approaches
compute this probability distribution by applying a dynamic programming
approach of quadratic complexity. In this paper we theoretically as well as
experimentally show that our framework reduces this to a linear-time complexity
while having the same memory requirements, facilitated by incremental accessing
of the uncertain vector instances in increasing order of their distance to the
reference object. Furthermore, we show how the output of our method can be used
to apply probabilistic top-k ranking for the objects, according to different
state-of-the-art definitions. We conduct an experimental evaluation on
synthetic and real data, which demonstrates the efficiency of our approach
The Life-Cycle Income Analysis Model (LIAM): a study of a flexible dynamic microsimulation modelling computing framework
This paper describes a flexible computing framework designed to create a dynamic microsimulation model, the Life-cycle Income Analysis Model (LIAM). The principle computing characteristics include the degree of modularisation, parameterisation, generalisation and robustness. The paper describes the decisions taken with regard to type of dynamic model used. The LIAM framework has been used to create a number of different microsimulation models, including an Irish dynamic cohort model, a spatial dynamic microsimulation model for Ireland, an indirect tax and consumption model for EU15 as part of EUROMOD and a prototype EU dynamic population microsimulation model for 5 EU countries. Particular consideration is given to issues of parameterisation, alignment and computational efficiency.flexible; modular; dynamic; alignment; parameterisation; computational efficiency
Gamma-based clustering via ordered means with application to gene-expression analysis
Discrete mixture models provide a well-known basis for effective clustering
algorithms, although technical challenges have limited their scope. In the
context of gene-expression data analysis, a model is presented that mixes over
a finite catalog of structures, each one representing equality and inequality
constraints among latent expected values. Computations depend on the
probability that independent gamma-distributed variables attain each of their
possible orderings. Each ordering event is equivalent to an event in
independent negative-binomial random variables, and this finding guides a
dynamic-programming calculation. The structuring of mixture-model components
according to constraints among latent means leads to strict concavity of the
mixture log likelihood. In addition to its beneficial numerical properties, the
clustering method shows promising results in an empirical study.Comment: Published in at http://dx.doi.org/10.1214/10-AOS805 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Common Protocol for Agent-Based Social Simulation
Traditional (i.e. analytical) modelling practices in the social sciences rely on a very well established, although implicit, methodological protocol, both with respect to the way models are presented and to the kinds of analysis that are performed. Unfortunately, computer-simulated models often lack such a reference to an accepted methodological standard. This is one of the main reasons for the scepticism among mainstream social scientists that results in low acceptance of papers with agent-based methodology in the top journals. We identify some methodological pitfalls that, according to us, are common in papers employing agent-based simulations, and propose appropriate solutions. We discuss each issue with reference to a general characterization of dynamic micro models, which encompasses both analytical and simulation models. In the way, we also clarify some confusing terminology. We then propose a three-stage process that could lead to the establishment of methodological standards in social and economic simulations.Agent-Based, Simulations, Methodology, Calibration, Validation, Sensitivity Analysis
A Common Protocol for Agent-Based Social Simulation
Traditional (i.e. analytical) modelling practices in the social sciences rely on a very well established, although implicit, methodological protocol, both with respect to the way models are presented and to the kinds of analysis that are performed. Unfortunately, computer-simulated models often lack such a reference to an accepted methodological standard. This is one of the main reasons for the scepticism among mainstream social scientists that results in low acceptance of papers with agent-based methodology in the top journals. We identify some methodological pitfalls that, according to us, are common in papers employing agent-based simulations, and propose appropriate solutions. We discuss each issue with reference to a general characterization of dynamic micro models, which encompasses both analytical and simulation models. In the way, we also clarify some confusing terminology. We then propose a three-stage process that could lead to the establishment of methodological standards in social and economic simulations.Agent-based, simulations, methodology, calibration, validation.
- …