2,745 research outputs found
Network Archaeology: Uncovering Ancient Networks from Present-day Interactions
Often questions arise about old or extinct networks. What proteins interacted
in a long-extinct ancestor species of yeast? Who were the central players in
the Last.fm social network 3 years ago? Our ability to answer such questions
has been limited by the unavailability of past versions of networks. To
overcome these limitations, we propose several algorithms for reconstructing a
network's history of growth given only the network as it exists today and a
generative model by which the network is believed to have evolved. Our
likelihood-based method finds a probable previous state of the network by
reversing the forward growth model. This approach retains node identities so
that the history of individual nodes can be tracked. We apply these algorithms
to uncover older, non-extant biological and social networks believed to have
grown via several models, including duplication-mutation with complementarity,
forest fire, and preferential attachment. Through experiments on both synthetic
and real-world data, we find that our algorithms can estimate node arrival
times, identify anchor nodes from which new nodes copy links, and can reveal
significant features of networks that have long since disappeared.Comment: 16 pages, 10 figure
DM-PhyClus: A Bayesian phylogenetic algorithm for infectious disease transmission cluster inference
Background. Conventional phylogenetic clustering approaches rely on arbitrary
cutpoints applied a posteriori to phylogenetic estimates. Although in practice,
Bayesian and bootstrap-based clustering tend to lead to similar estimates, they
often produce conflicting measures of confidence in clusters. The current study
proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as
DM-PhyClus, that identifies sets of sequences resulting from quick transmission
chains, thus yielding easily-interpretable clusters, without using any ad hoc
distance or confidence requirement. Results. Simulations reveal that DM-PhyClus
can outperform conventional clustering methods, as well as the Gap procedure, a
pure distance-based algorithm, in terms of mean cluster recovery. We apply
DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters
whose inference is in line with the conclusions of a previous thorough
analysis. Conclusions. DM-PhyClus, by eliminating the need for cutpoints and
producing sensible inference for cluster configurations, can facilitate
transmission cluster detection. Future efforts to reduce incidence of
infectious diseases, like HIV-1, will need reliable estimates of transmission
clusters. It follows that algorithms like DM-PhyClus could serve to better
inform public health strategies
Machine-Assisted Map Editing
Mapping road networks today is labor-intensive. As a result, road maps have
poor coverage outside urban centers in many countries. Systems to automatically
infer road network graphs from aerial imagery and GPS trajectories have been
proposed to improve coverage of road maps. However, because of high error
rates, these systems have not been adopted by mapping communities. We propose
machine-assisted map editing, where automatic map inference is integrated into
existing, human-centric map editing workflows. To realize this, we build
Machine-Assisted iD (MAiD), where we extend the web-based OpenStreetMap editor,
iD, with machine-assistance functionality. We complement MAiD with a novel
approach for inferring road topology from aerial imagery that combines the
speed of prior segmentation approaches with the accuracy of prior iterative
graph construction methods. We design MAiD to tackle the addition of major,
arterial roads in regions where existing maps have poor coverage, and the
incremental improvement of coverage in regions where major roads are already
mapped. We conduct two user studies and find that, when participants are given
a fixed time to map roads, they are able to add as much as 3.5x more roads with
MAiD
Genome-wide inference of ancestral recombination graphs
The complex correlation structure of a collection of orthologous DNA
sequences is uniquely captured by the "ancestral recombination graph" (ARG), a
complete record of coalescence and recombination events in the history of the
sample. However, existing methods for ARG inference are computationally
intensive, highly approximate, or limited to small numbers of sequences, and,
as a consequence, explicit ARG inference is rarely used in applied population
genomics. Here, we introduce a new algorithm for ARG inference that is
efficient enough to apply to dozens of complete mammalian genomes. The key idea
of our approach is to sample an ARG of n chromosomes conditional on an ARG of
n-1 chromosomes, an operation we call "threading." Using techniques based on
hidden Markov models, we can perform this threading operation exactly, up to
the assumptions of the sequentially Markov coalescent and a discretization of
time. An extension allows for threading of subtrees instead of individual
sequences. Repeated application of these threading operations results in highly
efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these
methods in a computer program called ARGweaver. Experiments with simulated data
indicate that ARGweaver converges rapidly to the true posterior distribution
and is effective in recovering various features of the ARG for dozens of
sequences generated under realistic parameters for human populations. In
applications of ARGweaver to 54 human genome sequences from Complete Genomics,
we find clear signatures of natural selection, including regions of unusually
ancient ancestry associated with balancing selection and reductions in allele
age in sites under directional selection. Preliminary results also indicate
that our methods can be used to gain insight into complex features of human
population structure, even with a noninformative prior distribution.Comment: 88 pages, 7 main figures, 22 supplementary figures. This version
contains a substantially expanded genomic data analysi
Evolutionary Inference via the Poisson Indel Process
We address the problem of the joint statistical inference of phylogenetic
trees and multiple sequence alignments from unaligned molecular sequences. This
problem is generally formulated in terms of string-valued evolutionary
processes along the branches of a phylogenetic tree. The classical evolutionary
process, the TKF91 model, is a continuous-time Markov chain model comprised of
insertion, deletion and substitution events. Unfortunately this model gives
rise to an intractable computational problem---the computation of the marginal
likelihood under the TKF91 model is exponential in the number of taxa. In this
work, we present a new stochastic process, the Poisson Indel Process (PIP), in
which the complexity of this computation is reduced to linear. The new model is
closely related to the TKF91 model, differing only in its treatment of
insertions, but the new model has a global characterization as a Poisson
process on the phylogeny. Standard results for Poisson processes allow key
computations to be decoupled, which yields the favorable computational profile
of inference under the PIP model. We present illustrative experiments in which
Bayesian inference under the PIP model is compared to separate inference of
phylogenies and alignments.Comment: 33 pages, 6 figure
Network-provider-independent overlays for resilience and quality of service.
PhDOverlay networks are viewed as one of the solutions addressing the inefficiency and slow
evolution of the Internet and have been the subject of significant research. Most existing
overlays providing resilience and/or Quality of Service (QoS) need cooperation among
different network providers, but an inter-trust issue arises and cannot be easily solved.
In this thesis, we mainly focus on network-provider-independent overlays and investigate
their performance in providing two different types of service. Specifically, this thesis
addresses the following problems:
Provider-independent overlay architecture: A provider-independent overlay
framework named Resilient Overlay for Mission-Critical Applications (ROMCA)
is proposed. We elaborate its structure including component composition and
functions and also provide several operational examples.
Overlay topology construction for providing resilience service: We investigate the topology design problem of provider-independent overlays aiming to provide resilience service. To be more specific, based on the ROMCA framework, we
formulate this problem mathematically and prove its NP-hardness. Three heuristics are proposed and extensive simulations are carried out to verify their effectiveness.
Application mapping with resilience and QoS guarantees: Assuming application mapping is the targeted service for ROMCA, we formulate this problem as
an Integer Linear Program (ILP). Moreover, a simple but effective heuristic is
proposed to address this issue in a time-efficient manner. Simulations with both
synthetic and real networks prove the superiority of both solutions over existing
ones.
Substrate topology information availability and the impact of its accuracy on overlay performance: Based on our survey that summarizes the methodologies available for inferring the selective substrate topology formed among a group
of nodes through active probing, we find that such information is usually inaccurate
and additional mechanisms are needed to secure a better inferred topology. Therefore, we examine the impact of inferred substrate topology accuracy on overlay
performance given only inferred substrate topology information
Alpha, Betti and the Megaparsec Universe: on the Topology of the Cosmic Web
We study the topology of the Megaparsec Cosmic Web in terms of the
scale-dependent Betti numbers, which formalize the topological information
content of the cosmic mass distribution. While the Betti numbers do not fully
quantify topology, they extend the information beyond conventional cosmological
studies of topology in terms of genus and Euler characteristic. The richer
information content of Betti numbers goes along the availability of fast
algorithms to compute them.
For continuous density fields, we determine the scale-dependence of Betti
numbers by invoking the cosmologically familiar filtration of sublevel or
superlevel sets defined by density thresholds. For the discrete galaxy
distribution, however, the analysis is based on the alpha shapes of the
particles. These simplicial complexes constitute an ordered sequence of nested
subsets of the Delaunay tessellation, a filtration defined by the scale
parameter, . As they are homotopy equivalent to the sublevel sets of
the distance field, they are an excellent tool for assessing the topological
structure of a discrete point distribution. In order to develop an intuitive
understanding for the behavior of Betti numbers as a function of , and
their relation to the morphological patterns in the Cosmic Web, we first study
them within the context of simple heuristic Voronoi clustering models.
Subsequently, we address the topology of structures emerging in the standard
LCDM scenario and in cosmological scenarios with alternative dark energy
content. The evolution and scale-dependence of the Betti numbers is shown to
reflect the hierarchical evolution of the Cosmic Web and yields a promising
measure of cosmological parameters. We also discuss the expected Betti numbers
as a function of the density threshold for superlevel sets of a Gaussian random
field.Comment: 42 pages, 14 figure
- …