42 research outputs found

    Shannon Meets Carnot: Generalized Second Thermodynamic Law

    Full text link
    The classical thermodynamic laws fail to capture the behavior of systems with energy Hamiltonian which is an explicit function of the temperature. Such Hamiltonian arises, for example, in modeling information processing systems, like communication channels, as thermal systems. Here we generalize the second thermodynamic law to encompass systems with temperature-dependent energy levels, dQ=TdS+dTdQ=TdS+dT, where denotes averaging over the Boltzmann distribution and reveal a new definition to the basic notion of temperature. This generalization enables to express, for instance, the mutual information of the Gaussian channel as a consequence of the fundamental laws of nature - the laws of thermodynamics

    Parallel vs. Sequential Belief Propagation Decoding of LDPC Codes over GF(q) and Markov Sources

    Full text link
    A sequential updating scheme (SUS) for belief propagation (BP) decoding of LDPC codes over Galois fields, GF(q)GF(q), and correlated Markov sources is proposed, and compared with the standard parallel updating scheme (PUS). A thorough experimental study of various transmission settings indicates that the convergence rate, in iterations, of the BP algorithm (and subsequently its complexity) for the SUS is about one half of that for the PUS, independent of the finite field size qq. Moreover, this 1/2 factor appears regardless of the correlations of the source and the channel's noise model, while the error correction performance remains unchanged. These results may imply on the 'universality' of the one half convergence speed-up of SUS decoding

    High-resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions

    Get PDF
    The emergence of massively parallel sequencing technology has revolutionized microbial profiling, allowing the unprecedented comparison of microbial diversity across time and space in a wide range of host-associated and environmental ecosystems. Although the high-throughput nature of such methods enables the detection of low-frequency bacteria, these advances come at the cost of sequencing read length, limiting the phylogenetic resolution possible by current methods. Here, we present a generic approach for integrating short reads from large genomic regions, thus enabling phylogenetic resolution far exceeding current methods. The approach is based on a mapping to a statistical model that is later solved as a constrained optimization problem. We demonstrate the utility of this method by analyzing human saliva and Drosophila samples, using Illumina single-end sequencing of a 750 bp amplicon of the 16S rRNA gene. Phylogenetic resolution is significantly extended while reducing the number of falsely detected bacteria, as compared with standard single-region Roche 454 Pyrosequencing. Our approach can be seamlessly applied to simultaneous sequencing of multiple genes providing a higher resolution view of the composition and activity of complex microbial communities

    An information theoretic approach to statistical dependence: copula information

    Full text link
    We discuss the connection between information and copula theories by showing that a copula can be employed to decompose the information content of a multivariate distribution into marginal and dependence components, with the latter quantified by the mutual information. We define the information excess as a measure of deviation from a maximum entropy distribution. The idea of marginal invariant dependence measures is also discussed and used to show that empirical linear correlation underestimates the amplitude of the actual correlation in the case of non-Gaussian marginals. The mutual information is shown to provide an upper bound for the asymptotic empirical log-likelihood of a copula. An analytical expression for the information excess of T-copulas is provided, allowing for simple model identification within this family. We illustrate the framework in a financial data set.Comment: to appear in Europhysics Letter

    Optimal Location of Sources in Transportation Networks

    Full text link
    We consider the problem of optimizing the locations of source nodes in transportation networks. A reduction of the fraction of surplus nodes induces a glassy transition. In contrast to most constraint satisfaction problems involving discrete variables, our problem involves continuous variables which lead to cavity fields in the form of functions. The one-step replica symmetry breaking (1RSB) solution involves solving a stable distribution of functionals, which is in general infeasible. In this paper, we obtain small closed sets of functional cavity fields and demonstrate how functional recursions are converted to simple recursions of probabilities, which make the 1RSB solution feasible. The physical results in the replica symmetric (RS) and the 1RSB frameworks are thus derived and the stability of the RS and 1RSB solutions are examined.Comment: 38 pages, 18 figure

    Identification of rare alleles and their carriers using compressed se(que)nsing

    Get PDF
    Identification of rare variants by resequencing is important both for detecting novel variations and for screening individuals for known disease alleles. New technologies enable low-cost resequencing of target regions, although it is still prohibitive to test more than a few individuals. We propose a novel pooling design that enables the recovery of novel or known rare alleles and their carriers in groups of individuals. The method is based on a Compressed Sensing (CS) approach, which is general, simple and efficient. CS allows the use of generic algorithmic tools for simultaneous identification of multiple variants and their carriers. We model the experimental procedure and show via computer simulations that it enables the recovery of rare alleles and their carriers in larger groups than were possible before. Our approach can also be combined with barcoding techniques to provide a feasible solution based on current resequencing costs. For example, when targeting a small enough genomic region (∼100 bp) and using only ∼10 sequencing lanes and ∼10 distinct barcodes per lane, one recovers the identity of 4 rare allele carriers out of a population of over 4000 individuals. We demonstrate the performance of our approach over several publicly available experimental data sets

    PepDist: A New Framework for Protein-Peptide Binding Prediction based on Learning Peptide Distance Functions

    Get PDF
    BACKGROUND: Many different aspects of cellular signalling, trafficking and targeting mechanisms are mediated by interactions between proteins and peptides. Representative examples are MHC-peptide complexes in the immune system. Developing computational methods for protein-peptide binding prediction is therefore an important task with applications to vaccine and drug design. METHODS: Previous learning approaches address the binding prediction problem using traditional margin based binary classifiers. In this paper we propose PepDist: a novel approach for predicting binding affinity. Our approach is based on learning peptide-peptide distance functions. Moreover, we suggest to learn a single peptide-peptide distance function over an entire family of proteins (e.g. MHC class I). This distance function can be used to compute the affinity of a novel peptide to any of the proteins in the given family. In order to learn these peptide-peptide distance functions, we formalize the problem as a semi-supervised learning problem with partial information in the form of equivalence constraints. Specifically, we propose to use DistBoost [1,2], which is a semi-supervised distance learning algorithm. RESULTS: We compare our method to various state-of-the-art binding prediction algorithms on MHC class I and MHC class II datasets. In almost all cases, our method outperforms all of its competitors. One of the major advantages of our novel approach is that it can also learn an affinity function over proteins for which only small amounts of labeled peptides exist. In these cases, our method's performance gain, when compared to other computational methods, is even more pronounced. We have recently uploaded the PepDist webserver which provides binding prediction of peptides to 35 different MHC class I alleles. The webserver which can be found at is powered by a prediction engine which was trained using the framework presented in this paper. CONCLUSION: The results obtained suggest that learning a single distance function over an entire family of proteins achieves higher prediction accuracy than learning a set of binary classifiers for each of the proteins separately. We also show the importance of obtaining information on experimentally determined non-binders. Learning with real non-binders generalizes better than learning with randomly generated peptides that are assumed to be non-binders. This suggests that information about non-binding peptides should also be published and made publicly available

    Importance of Post-Translational Modifications for Functionality of a Chloroplast-Localized Carbonic Anhydrase (CAH1) in Arabidopsis thaliana

    Get PDF
    Background: The Arabidopsis CAH1 alpha-type carbonic anhydrase is one of the few plant proteins known to be targeted to the chloroplast through the secretory pathway. CAH1 is post-translationally modified at several residues by the attachment of N-glycans, resulting in a mature protein harbouring complex-type glycans. The reason of why trafficking through this non-canonical pathway is beneficial for certain chloroplast resident proteins is not yet known. Therefore, to elucidate the significance of glycosylation in trafficking and the effect of glycosylation on the stability and function of the protein, epitope-labelled wild type and mutated versions of CAH1 were expressed in plant cells. Methodology/Principal Findings: Transient expression of mutant CAH1 with disrupted glycosylation sites showed that the protein harbours four, or in certain cases five, N-glycans. While the wild type protein trafficked through the secretory pathway to the chloroplast, the non-glycosylated protein formed aggregates and associated with the ER chaperone BiP, indicating that glycosylation of CAH1 facilitates folding and ER-export. Using cysteine mutants we also assessed the role of disulphide bridge formation in the folding and stability of CAH1. We found that a disulphide bridge between cysteines at positions 27 and 191 in the mature protein was required for correct folding of the protein. Using a mass spectrometric approach we were able to measure the enzymatic activity of CAH1 protein. Under circumstances where protein N-glycosylation is blocked in vivo, the activity of CAH1 is completely inhibited. Conclusions/Significance: We show for the first time the importance of post-translational modifications such as N-glycosylation and intramolecular disulphide bridge formation in folding and trafficking of a protein from the secretory pathway to the chloroplast in higher plants. Requirements for these post-translational modifications for a fully functional native protein explain the need for an alternative route to the chloroplast.This work was supported by the Swedish Research Council (VR), the Kempe Foundations and Carl Tryggers Foundation to GS, and grant numbers BIO2006-08946 and BIO2009-11340 from the Spanish Ministerio de Ciencia e Innovación (MICINN) to A

    Quality and Productivity Improvement of Wax Flowers

    Full text link
    Rosana G. Moreira, Editor-in-Chief; Texas A&M UniversityThis is a paper from International Commission of Agricultural Engineering (CIGR, Commission Internationale du Genie Rural) E-Journal Volume 9 (2007): Quality and Productivity Improvement of Wax Flowers. Manuscript CIOSTA 07 004. Vol. IX. December, 2007
    corecore