273,788 research outputs found
The Sample Complexity of Search over Multiple Populations
This paper studies the sample complexity of searching over multiple
populations. We consider a large number of populations, each corresponding to
either distribution P0 or P1. The goal of the search problem studied here is to
find one population corresponding to distribution P1 with as few samples as
possible. The main contribution is to quantify the number of samples needed to
correctly find one such population. We consider two general approaches:
non-adaptive sampling methods, which sample each population a predetermined
number of times until a population following P1 is found, and adaptive sampling
methods, which employ sequential sampling schemes for each population. We first
derive a lower bound on the number of samples required by any sampling scheme.
We then consider an adaptive procedure consisting of a series of sequential
probability ratio tests, and show it comes within a constant factor of the
lower bound. We give explicit expressions for this constant when samples of the
populations follow Gaussian and Bernoulli distributions. An alternative
adaptive scheme is discussed which does not require full knowledge of P1, and
comes within a constant factor of the optimal scheme. For comparison, a lower
bound on the sampling requirements of any non-adaptive scheme is presented.Comment: To appear, IEEE Transactions on Information Theor
Compositional Model Repositories via Dynamic Constraint Satisfaction with Order-of-Magnitude Preferences
The predominant knowledge-based approach to automated model construction,
compositional modelling, employs a set of models of particular functional
components. Its inference mechanism takes a scenario describing the constituent
interacting components of a system and translates it into a useful mathematical
model. This paper presents a novel compositional modelling approach aimed at
building model repositories. It furthers the field in two respects. Firstly, it
expands the application domain of compositional modelling to systems that can
not be easily described in terms of interacting functional components, such as
ecological systems. Secondly, it enables the incorporation of user preferences
into the model selection process. These features are achieved by casting the
compositional modelling problem as an activity-based dynamic preference
constraint satisfaction problem, where the dynamic constraints describe the
restrictions imposed over the composition of partial models and the preferences
correspond to those of the user of the automated modeller. In addition, the
preference levels are represented through the use of symbolic values that
differ in orders of magnitude
Modeling cumulative biological phenomena with Suppes-Bayes Causal Networks
Several diseases related to cell proliferation are characterized by the
accumulation of somatic DNA changes, with respect to wildtype conditions.
Cancer and HIV are two common examples of such diseases, where the mutational
load in the cancerous/viral population increases over time. In these cases,
selective pressures are often observed along with competition, cooperation and
parasitism among distinct cellular clones. Recently, we presented a
mathematical framework to model these phenomena, based on a combination of
Bayesian inference and Suppes' theory of probabilistic causation, depicted in
graphical structures dubbed Suppes-Bayes Causal Networks (SBCNs). SBCNs are
generative probabilistic graphical models that recapitulate the potential
ordering of accumulation of such DNA changes during the progression of the
disease. Such models can be inferred from data by exploiting likelihood-based
model-selection strategies with regularization. In this paper we discuss the
theoretical foundations of our approach and we investigate in depth the
influence on the model-selection task of: (i) the poset based on Suppes' theory
and (ii) different regularization strategies. Furthermore, we provide an
example of application of our framework to HIV genetic data highlighting the
valuable insights provided by the inferred
Digital Ecosystems: Ecosystem-Oriented Architectures
We view Digital Ecosystems to be the digital counterparts of biological
ecosystems. Here, we are concerned with the creation of these Digital
Ecosystems, exploiting the self-organising properties of biological ecosystems
to evolve high-level software applications. Therefore, we created the Digital
Ecosystem, a novel optimisation technique inspired by biological ecosystems,
where the optimisation works at two levels: a first optimisation, migration of
agents which are distributed in a decentralised peer-to-peer network, operating
continuously in time; this process feeds a second optimisation based on
evolutionary computing that operates locally on single peers and is aimed at
finding solutions to satisfy locally relevant constraints. The Digital
Ecosystem was then measured experimentally through simulations, with measures
originating from theoretical ecology, evaluating its likeness to biological
ecosystems. This included its responsiveness to requests for applications from
the user base, as a measure of the ecological succession (ecosystem maturity).
Overall, we have advanced the understanding of Digital Ecosystems, creating
Ecosystem-Oriented Architectures where the word ecosystem is more than just a
metaphor.Comment: 39 pages, 26 figures, journa
Scalable Population Synthesis with Deep Generative Modeling
Population synthesis is concerned with the generation of synthetic yet
realistic representations of populations. It is a fundamental problem in the
modeling of transport where the synthetic populations of micro-agents represent
a key input to most agent-based models. In this paper, a new methodological
framework for how to 'grow' pools of micro-agents is presented. The model
framework adopts a deep generative modeling approach from machine learning
based on a Variational Autoencoder (VAE). Compared to the previous population
synthesis approaches, including Iterative Proportional Fitting (IPF), Gibbs
sampling and traditional generative models such as Bayesian Networks or Hidden
Markov Models, the proposed method allows fitting the full joint distribution
for high dimensions. The proposed methodology is compared with a conventional
Gibbs sampler and a Bayesian Network by using a large-scale Danish trip diary.
It is shown that, while these two methods outperform the VAE in the
low-dimensional case, they both suffer from scalability issues when the number
of modeled attributes increases. It is also shown that the Gibbs sampler
essentially replicates the agents from the original sample when the required
conditional distributions are estimated as frequency tables. In contrast, the
VAE allows addressing the problem of sampling zeros by generating agents that
are virtually different from those in the original data but have similar
statistical properties. The presented approach can support agent-based modeling
at all levels by enabling richer synthetic populations with smaller zones and
more detailed individual characteristics.Comment: 27 pages, 15 figures, 4 table
A rapid and scalable method for multilocus species delimitation using Bayesian model comparison and rooted triplets
Multilocus sequence data provide far greater power to resolve species limits than the single locus data typically used for broad surveys of clades. However, current statistical methods based on a multispecies coalescent framework are computationally demanding, because of the number of possible delimitations that must be compared and time-consuming likelihood calculations. New methods are therefore needed to open up the power of multilocus approaches to larger systematic surveys. Here, we present a rapid and scalable method that introduces two new innovations. First, the method reduces the complexity of likelihood calculations by decomposing the tree into rooted triplets. The distribution of topologies for a triplet across multiple loci has a uniform trinomial distribution when the 3 individuals belong to the same species, but a skewed distribution if they belong to separate species with a form that is specified by the multispecies coalescent. A Bayesian model comparison framework was developed and the best delimitation found by comparing the product of posterior probabilities of all triplets. The second innovation is a new dynamic programming algorithm for finding the optimum delimitation from all those compatible with a guide tree by successively analyzing subtrees defined by each node. This algorithm removes the need for heuristic searches used by current methods, and guarantees that the best solution is found and potentially could be used in other systematic applications. We assessed the performance of the method with simulated, published and newly generated data. Analyses of simulated data demonstrate that the combined method has favourable statistical properties and scalability with increasing sample sizes. Analyses of empirical data from both eukaryotes and prokaryotes demonstrate its potential for delimiting species in real cases
- …