11,157 research outputs found
Collective Phenomena and Non-Finite State Computation in a Human Social System
We investigate the computational structure of a paradigmatic example of
distributed social interaction: that of the open-source Wikipedia community. We
examine the statistical properties of its cooperative behavior, and perform
model selection to determine whether this aspect of the system can be described
by a finite-state process, or whether reference to an effectively unbounded
resource allows for a more parsimonious description. We find strong evidence,
in a majority of the most-edited pages, in favor of a collective-state model,
where the probability of a "revert" action declines as the square root of the
number of non-revert actions seen since the last revert. We provide evidence
that the emergence of this social counter is driven by collective interaction
effects, rather than properties of individual users.Comment: 23 pages, 4 figures, 3 tables; to appear in PLoS ON
Average-case analysis of perfect sorting by reversals (Journal Version)
Perfect sorting by reversals, a problem originating in computational
genomics, is the process of sorting a signed permutation to either the identity
or to the reversed identity permutation, by a sequence of reversals that do not
break any common interval. B\'erard et al. (2007) make use of strong interval
trees to describe an algorithm for sorting signed permutations by reversals.
Combinatorial properties of this family of trees are essential to the algorithm
analysis. Here, we use the expected value of certain tree parameters to prove
that the average run-time of the algorithm is at worst, polynomial, and
additionally, for sufficiently long permutations, the sorting algorithm runs in
polynomial time with probability one. Furthermore, our analysis of the subclass
of commuting scenarios yields precise results on the average length of a
reversal, and the average number of reversals.Comment: A preliminary version of this work appeared in the proceedings of
Combinatorial Pattern Matching (CPM) 2009. See arXiv:0901.2847; Discrete
Mathematics, Algorithms and Applications, vol. 3(3), 201
Restricted Covariance Priors with Applications in Spatial Statistics
We present a Bayesian model for area-level count data that uses Gaussian
random effects with a novel type of G-Wishart prior on the inverse
variance--covariance matrix. Specifically, we introduce a new distribution
called the truncated G-Wishart distribution that has support over precision
matrices that lead to positive associations between the random effects of
neighboring regions while preserving conditional independence of
non-neighboring regions. We describe Markov chain Monte Carlo sampling
algorithms for the truncated G-Wishart prior in a disease mapping context and
compare our results to Bayesian hierarchical models based on intrinsic
autoregression priors. A simulation study illustrates that using the truncated
G-Wishart prior improves over the intrinsic autoregressive priors when there
are discontinuities in the disease risk surface. The new model is applied to an
analysis of cancer incidence data in Washington State.Comment: Published at http://dx.doi.org/10.1214/14-BA927 in the Bayesian
Analysis (http://projecteuclid.org/euclid.ba) by the International Society of
Bayesian Analysis (http://bayesian.org/
Software Engineering and Complexity in Effective Algebraic Geometry
We introduce the notion of a robust parameterized arithmetic circuit for the
evaluation of algebraic families of multivariate polynomials. Based on this
notion, we present a computation model, adapted to Scientific Computing, which
captures all known branching parsimonious symbolic algorithms in effective
Algebraic Geometry. We justify this model by arguments from Software
Engineering. Finally we exhibit a class of simple elimination problems of
effective Algebraic Geometry which require exponential time to be solved by
branching parsimonious algorithms of our computation model.Comment: 70 pages. arXiv admin note: substantial text overlap with
arXiv:1201.434
Constrained Optimization for a Subset of the Gaussian Parsimonious Clustering Models
The expectation-maximization (EM) algorithm is an iterative method for
finding maximum likelihood estimates when data are incomplete or are treated as
being incomplete. The EM algorithm and its variants are commonly used for
parameter estimation in applications of mixture models for clustering and
classification. This despite the fact that even the Gaussian mixture model
likelihood surface contains many local maxima and is singularity riddled.
Previous work has focused on circumventing this problem by constraining the
smallest eigenvalue of the component covariance matrices. In this paper, we
consider constraining the smallest eigenvalue, the largest eigenvalue, and both
the smallest and largest within the family setting. Specifically, a subset of
the GPCM family is considered for model-based clustering, where we use a
re-parameterized version of the famous eigenvalue decomposition of the
component covariance matrices. Our approach is illustrated using various
experiments with simulated and real data
Inference of Ancestral Recombination Graphs through Topological Data Analysis
The recent explosion of genomic data has underscored the need for
interpretable and comprehensive analyses that can capture complex phylogenetic
relationships within and across species. Recombination, reassortment and
horizontal gene transfer constitute examples of pervasive biological phenomena
that cannot be captured by tree-like representations. Starting from hundreds of
genomes, we are interested in the reconstruction of potential evolutionary
histories leading to the observed data. Ancestral recombination graphs
represent potential histories that explicitly accommodate recombination and
mutation events across orthologous genomes. However, they are computationally
costly to reconstruct, usually being infeasible for more than few tens of
genomes. Recently, Topological Data Analysis (TDA) methods have been proposed
as robust and scalable methods that can capture the genetic scale and frequency
of recombination. We build upon previous TDA developments for detecting and
quantifying recombination, and present a novel framework that can be applied to
hundreds of genomes and can be interpreted in terms of minimal histories of
mutation and recombination events, quantifying the scales and identifying the
genomic locations of recombinations. We implement this framework in a software
package, called TARGet, and apply it to several examples, including small
migration between different populations, human recombination, and horizontal
evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and
example files used in the manuscript can be obtained from
https://github.com/RabadanLab/TARGe
- …