28,600 research outputs found

    Quantitative Comparison of Abundance Structures of Generalized Communities: From B-Cell Receptor Repertoires to Microbiomes

    Full text link
    The \emph{community}, the assemblage of organisms co-existing in a given space and time, has the potential to become one of the unifying concepts of biology, especially with the advent of high-throughput sequencing experiments that reveal genetic diversity exhaustively. In this spirit we show that a tool from community ecology, the Rank Abundance Distribution (RAD), can be turned by the new MaxRank normalization method into a generic, expressive descriptor for quantitative comparison of communities in many areas of biology. To illustrate the versatility of the method, we analyze RADs from various \emph{generalized communities}, i.e.\ assemblages of genetically diverse cells or organisms, including human B cells, gut microbiomes under antibiotic treatment and of different ages and countries of origin, and other human and environmental microbial communities. We show that normalized RADs enable novel quantitative approaches that help to understand structures and dynamics of complex generalize communities

    Interactions between species introduce spurious associations in microbiome studies

    Full text link
    Microbiota contribute to many dimensions of host phenotype, including disease. To link specific microbes to specific phenotypes, microbiome-wide association studies compare microbial abundances between two groups of samples. Abundance differences, however, reflect not only direct associations with the phenotype, but also indirect effects due to microbial interactions. We found that microbial interactions could easily generate a large number of spurious associations that provide no mechanistic insight. Using techniques from statistical physics, we developed a method to remove indirect associations and applied it to the largest dataset on pediatric inflammatory bowel disease. Our method corrected the inflation of p-values in standard association tests and showed that only a small subset of associations is directly linked to the disease. Direct associations had a much higher accuracy in separating cases from controls and pointed to immunomodulation, butyrate production, and the brain-gut axis as important factors in the inflammatory bowel disease.Comment: 4 main text figures, 15 supplementary figures (i.e appendix) and 6 supplementary tables. Overall 49 pages including reference

    Detection of recombination in DNA multiple alignments with hidden markov models

    Get PDF
    CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected

    How multiplicity determines entropy and the derivation of the maximum entropy principle for complex systems

    Get PDF
    The maximum entropy principle (MEP) is a method for obtaining the most likely distribution functions of observables from statistical systems, by maximizing entropy under constraints. The MEP has found hundreds of applications in ergodic and Markovian systems in statistical mechanics, information theory, and statistics. For several decades there exists an ongoing controversy whether the notion of the maximum entropy principle can be extended in a meaningful way to non-extensive, non-ergodic, and complex statistical systems and processes. In this paper we start by reviewing how Boltzmann-Gibbs-Shannon entropy is related to multiplicities of independent random processes. We then show how the relaxation of independence naturally leads to the most general entropies that are compatible with the first three Shannon-Khinchin axioms, the (c,d)-entropies. We demonstrate that the MEP is a perfectly consistent concept for non-ergodic and complex statistical systems if their relative entropy can be factored into a generalized multiplicity and a constraint term. The problem of finding such a factorization reduces to finding an appropriate representation of relative entropy in a linear basis. In a particular example we show that path-dependent random processes with memory naturally require specific generalized entropies. The example is the first exact derivation of a generalized entropy from the microscopic properties of a path-dependent random process.Comment: 6 pages, 1 figure. To appear in PNA

    Frame Permutation Quantization

    Full text link
    Frame permutation quantization (FPQ) is a new vector quantization technique using finite frames. In FPQ, a vector is encoded using a permutation source code to quantize its frame expansion. This means that the encoding is a partial ordering of the frame expansion coefficients. Compared to ordinary permutation source coding, FPQ produces a greater number of possible quantization rates and a higher maximum rate. Various representations for the partitions induced by FPQ are presented, and reconstruction algorithms based on linear programming, quadratic programming, and recursive orthogonal projection are derived. Implementations of the linear and quadratic programming algorithms for uniform and Gaussian sources show performance improvements over entropy-constrained scalar quantization for certain combinations of vector dimension and coding rate. Monte Carlo evaluation of the recursive algorithm shows that mean-squared error (MSE) decays as 1/M^4 for an M-element frame, which is consistent with previous results on optimal decay of MSE. Reconstruction using the canonical dual frame is also studied, and several results relate properties of the analysis frame to whether linear reconstruction techniques provide consistent reconstructions.Comment: 29 pages, 5 figures; detailed added to proof of Theorem 4.3 and a few minor correction
    corecore