2,961 research outputs found

    Impossibility results on stability of phylogenetic consensus methods

    Full text link
    We answer two questions raised by Bryant, Francis and Steel in their work on consensus methods in phylogenetics. Consensus methods apply to every practical instance where it is desired to aggregate a set of given phylogenetic trees (say, gene evolution trees) into a resulting, "consensus" tree (say, a species tree). Various stability criteria have been explored in this context, seeking to model desirable consistency properties of consensus methods as the experimental data is updated (e.g., more taxa, or more trees, are mapped). However, such stability conditions can be incompatible with some basic regularity properties that are widely accepted to be essential in any meaningful consensus method. Here, we prove that such an incompatibility does arise in the case of extension stability on binary trees and in the case of associative stability. Our methods combine general theoretical considerations with the use of computer programs tailored to the given stability requirements

    On the scaling limits of planar percolation

    Full text link
    We prove Tsirelson's conjecture that any scaling limit of the critical planar percolation is a black noise. Our theorems apply to a number of percolation models, including site percolation on the triangular grid and any subsequential scaling limit of bond percolation on the square grid. We also suggest a natural construction for the scaling limit of planar percolation, and more generally of any discrete planar model describing connectivity properties.Comment: With an Appendix by Christophe Garban. Published in at http://dx.doi.org/10.1214/11-AOP659 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Universality at the edge of the spectrum in Wigner random matrices

    Full text link
    We prove universality at the edge for rescaled correlation functions of Wigner random matrices in the limit n→+∞n\to +\infty. As a corollary, we show that, after proper rescaling, the 1st, 2nd, 3rd, etc. eigenvalues of Wigner random hermitian (resp. real symmetric) matrix weakly converge to the distributions established by Tracy and Widom in G.U.E. (G.O.E.) cases.Comment: We corrected several misprints and little mistakes (the most important one is formula (1.15)). We also reformulated two auxiliary theorems (Theorem 2 and Theorem 3) to emphasize the asymptotic nature of the result

    Tracing evolutionary links between species

    Full text link
    The idea that all life on earth traces back to a common beginning dates back at least to Charles Darwin's {\em Origin of Species}. Ever since, biologists have tried to piece together parts of this `tree of life' based on what we can observe today: fossils, and the evolutionary signal that is present in the genomes and phenotypes of different organisms. Mathematics has played a key role in helping transform genetic data into phylogenetic (evolutionary) trees and networks. Here, I will explain some of the central concepts and basic results in phylogenetics, which benefit from several branches of mathematics, including combinatorics, probability and algebra.Comment: 18 pages, 6 figures (Invited review paper (draft version) for AMM

    Community detection and stochastic block models: recent developments

    Full text link
    The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences. This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds. The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed

    Discretization provides a conceptually simple tool to build expression networks

    Get PDF
    Biomarker identification, using network methods, depends on finding regular co-expression patterns; the overall connectivity is of greater importance than any single relationship. A second requirement is a simple algorithm for ranking patients on how relevant a gene-set is. For both of these requirements discretized data helps to first identify gene cliques, and then to stratify patients.We explore a biologically intuitive discretization technique which codes genes as up- or down-regulated, with values close to the mean set as unchanged; this allows a richer description of relationships between genes than can be achieved by positive and negative correlation. We find a close agreement between our results and the template gene-interactions used to build synthetic microarray-like data by SynTReN, which synthesizes "microarray" data using known relationships which are successfully identified by our method.We are able to split positive co-regulation into up-together and down-together and negative co-regulation is considered as directed up-down relationships. In some cases these exist in only one direction, with real data, but not with the synthetic data. We illustrate our approach using two studies on white blood cells and derived immortalized cell lines and compare the approach with standard correlation-based computations. No attempt is made to distinguish possible causal links as the search for biomarkers would be crippled by losing highly significant co-expression relationships. This contrasts with approaches like ARACNE and IRIS.The method is illustrated with an analysis of gene-expression for energy metabolism pathways. For each discovered relationship we are able to identify the samples on which this is based in the discretized sample-gene matrix, along with a simplified view of the patterns of gene expression; this helps to dissect the gene-sample relevant to a research topic--identifying sets of co-regulated and anti-regulated genes and the samples or patients in which this relationship occurs

    Discretization Provides a Conceptually Simple Tool to Build Expression Networks

    Get PDF
    Biomarker identification, using network methods, depends on finding regular co-expression patterns; the overall connectivity is of greater importance than any single relationship. A second requirement is a simple algorithm for ranking patients on how relevant a gene-set is. For both of these requirements discretized data helps to first identify gene cliques, and then to stratify patients
    • …
    corecore