2,961 research outputs found
Impossibility results on stability of phylogenetic consensus methods
We answer two questions raised by Bryant, Francis and Steel in their work on
consensus methods in phylogenetics. Consensus methods apply to every practical
instance where it is desired to aggregate a set of given phylogenetic trees
(say, gene evolution trees) into a resulting, "consensus" tree (say, a species
tree). Various stability criteria have been explored in this context, seeking
to model desirable consistency properties of consensus methods as the
experimental data is updated (e.g., more taxa, or more trees, are mapped).
However, such stability conditions can be incompatible with some basic
regularity properties that are widely accepted to be essential in any
meaningful consensus method. Here, we prove that such an incompatibility does
arise in the case of extension stability on binary trees and in the case of
associative stability. Our methods combine general theoretical considerations
with the use of computer programs tailored to the given stability requirements
On the scaling limits of planar percolation
We prove Tsirelson's conjecture that any scaling limit of the critical planar
percolation is a black noise. Our theorems apply to a number of percolation
models, including site percolation on the triangular grid and any subsequential
scaling limit of bond percolation on the square grid. We also suggest a natural
construction for the scaling limit of planar percolation, and more generally of
any discrete planar model describing connectivity properties.Comment: With an Appendix by Christophe Garban. Published in at
http://dx.doi.org/10.1214/11-AOP659 the Annals of Probability
(http://www.imstat.org/aop/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Universality at the edge of the spectrum in Wigner random matrices
We prove universality at the edge for rescaled correlation functions of
Wigner random matrices in the limit . As a corollary, we show
that, after proper rescaling, the 1st, 2nd, 3rd, etc. eigenvalues of Wigner
random hermitian (resp. real symmetric) matrix weakly converge to the
distributions established by Tracy and Widom in G.U.E. (G.O.E.) cases.Comment: We corrected several misprints and little mistakes (the most
important one is formula (1.15)). We also reformulated two auxiliary theorems
(Theorem 2 and Theorem 3) to emphasize the asymptotic nature of the result
Tracing evolutionary links between species
The idea that all life on earth traces back to a common beginning dates back
at least to Charles Darwin's {\em Origin of Species}. Ever since, biologists
have tried to piece together parts of this `tree of life' based on what we can
observe today: fossils, and the evolutionary signal that is present in the
genomes and phenotypes of different organisms. Mathematics has played a key
role in helping transform genetic data into phylogenetic (evolutionary) trees
and networks. Here, I will explain some of the central concepts and basic
results in phylogenetics, which benefit from several branches of mathematics,
including combinatorics, probability and algebra.Comment: 18 pages, 6 figures (Invited review paper (draft version) for AMM
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Discretization provides a conceptually simple tool to build expression networks
Biomarker identification, using network methods, depends on finding regular co-expression patterns; the overall connectivity is of greater importance than any single relationship. A second requirement is a simple algorithm for ranking patients on how relevant a gene-set is. For both of these requirements discretized data helps to first identify gene cliques, and then to stratify patients.We explore a biologically intuitive discretization technique which codes genes as up- or down-regulated, with values close to the mean set as unchanged; this allows a richer description of relationships between genes than can be achieved by positive and negative correlation. We find a close agreement between our results and the template gene-interactions used to build synthetic microarray-like data by SynTReN, which synthesizes "microarray" data using known relationships which are successfully identified by our method.We are able to split positive co-regulation into up-together and down-together and negative co-regulation is considered as directed up-down relationships. In some cases these exist in only one direction, with real data, but not with the synthetic data. We illustrate our approach using two studies on white blood cells and derived immortalized cell lines and compare the approach with standard correlation-based computations. No attempt is made to distinguish possible causal links as the search for biomarkers would be crippled by losing highly significant co-expression relationships. This contrasts with approaches like ARACNE and IRIS.The method is illustrated with an analysis of gene-expression for energy metabolism pathways. For each discovered relationship we are able to identify the samples on which this is based in the discretized sample-gene matrix, along with a simplified view of the patterns of gene expression; this helps to dissect the gene-sample relevant to a research topic--identifying sets of co-regulated and anti-regulated genes and the samples or patients in which this relationship occurs
Discretization Provides a Conceptually Simple Tool to Build Expression Networks
Biomarker identification, using network methods, depends on finding regular co-expression patterns; the overall connectivity is of greater importance than any single relationship. A second requirement is a simple algorithm for ranking patients on how relevant a gene-set is. For both of these requirements discretized data helps to first identify gene cliques, and then to stratify patients
- …