290 research outputs found

    Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a <it>P-value</it>, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a <it>Z-value </it>from a random score distribution obtained by a Monte-Carlo simulation. <it>Z-values </it>allow the deduction of an upper bound of the <it>P-value </it>(1/<it>Z-value</it><sup>2</sup>) following the TULIP theorem. Simulations of <it>Z</it>-<it>value </it>distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support.</p> <p>Results</p> <p>We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (<it>i.e.</it>, mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate.</p> <p>Conclusion</p> <p>Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.</p

    Glycerolipid transfer for the building of membranes in plant cells.

    Get PDF
    Membranes of plant organelles have specific glycerolipid compositions. Selective distribution of lipids at the levels of subcellular organelles, membrane leaflets and membrane domains reflects a complex and finely tuned lipid homeostasis. Glycerolipid neosynthesis occurs mainly in plastid envelope and endoplasmic reticulum membranes. Since most lipids are not only present in the membranes where they are synthesized, one cannot explain membrane specific lipid distribution by metabolic processes confined in each membrane compartment. In this review, we present our current understanding of glycerolipid trafficking in plant cells. We examine the potential mechanisms involved in lipid transport inside bilayers and from one membrane to another. We survey lipid transfers going through vesicular membrane flow and those dependent on lipid transfer proteins at membrane contact sites. By introducing recently described membrane lipid reorganization during phosphate deprivation and recent developments issued from mutant analyses, we detail the specific lipid transfers towards or outwards the chloroplast envelope

    Detection of new protein domains using co-occurrence: application to Plasmodium falciparum

    Get PDF
    International audienceMotivation: Hidden Markov Models (HMMs) have proved to be a powerful tool for protein domain identification in newly sequenced organisms. However, numerous domains may be missed in highly divergent proteins. This is the case for Plasmodium falciparum proteins, the main causal agent of human malaria. Results: We propose a method to improve the sensitivity of HMM domain detection by exploiting the tendency of the domains to appear preferentially with a few other favorite domains in a protein. When sequence information alone is not sufficient to warrant the presence of a particular domain, our method enables its detection on the basis of the presence of other Pfam or InterPro domains. Moreover, a shuffling procedure allows us to estimate the false discovery rate associated with the results. Applied to P. falciparum, our method identifies 585 new Pfam domains (versus the 3683 already known domains in the Pfam database) with an estimated error rate below 20%. These new domains provide 387 new Gene Ontology annotations to the P. falciparum proteome. Analogous and congruent results are obtained when applying the method to related Plasmodium species, P. vivax and P. yoelii. Availability: Supplementary Material and a database of the new domains and GO predictions achieved on Plasmodium proteins are available at http://www.lirmm.fr/~terrapon/codd

    A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities

    Get PDF
    BACKGROUND: Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. RESULTS: We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. CONCLUSION: The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations

    Science and dance collective motions

    Get PDF
    International audienceModelling collective movements of animals, which is a field of current research in physics, was used to support educational innovation involving scientists (students and researchers) and professional dancers at the end of their initial training process. The basic elements to understand this model and the original association between science and art are described by showing the contributions - and their limits – to the informal teaching of science and dance

    Phosphate deprivation induces transfer of DGDG galactolipid from chloroplast to mitochondria

    Get PDF
    In many soils plants have to grow in a shortage of phosphate, leading to development of phosphate-saving mechanisms. At the cellular level, these mechanisms include conversion of phospholipids into glycolipids, mainly digalactosyldiacylglycerol (DGDG). The lipid changes are not restricted to plastid membranes where DGDG is synthesized and resides under normal conditions. In plant cells deprived of phosphate, mitochondria contain a high concentration of DGDG, whereas mitochondria have no glycolipids in control cells. Mitochondria do not synthesize this pool of DGDG, which structure is shown to be characteristic of a DGD type enzyme present in plastid envelope. The transfer of DGDG between plastid and mitochondria is investigated and detected between mitochondria-closely associated envelope vesicles and mitochondria. This transfer does not apparently involve the endomembrane system and would rather be dependent upon contacts between plastids and mitochondria. Contacts sites are favored at early stages of phosphate deprivation when DGDG cell content is just starting to respond to phosphate deprivation

    Clustering Libraries of Compounds into Families: Asymmetry-Based Similarity Measure to Categorize Small Molecules

    Get PDF
    International audienceClustering Libraries of Compounds into Families: Asymmetry-Based Similarity Measure to Categorize Small Molecule

    Successes and challenges in multiscale modelling of artificial metalloenzymes : the case study of POP-Rh2 cyclopropanase

    Get PDF
    Molecular modelling applications in metalloenzyme design are still scarce due to a series of challenges. On top of that, the simulations of metal-mediated binding and the identification of catalytic competent geometries require both large conformational exploration and simulation of fine electronic properties. Here, we demonstrate how the incorporation of new tools in multiscale strategies, namely substrate diffusion exploration, allows taking a step further. As a showcase, the enantioselective profiles of the most outstanding variants of an artificial Rh2-based cyclopropanase (GSH, HFF and RFY) developed by Lewis and co-workers (Nat. Commun., 2015, 6, 7789 and Nat. Chem., 2018, 10, 318-324) have been rationalized. DFT calculations on the free-cofactor-mediated process identify the carbene insertion and the cyclopropanoid formation as crucial events, the latter being the enantiodetermining step, which displays up to 8 competitive orientations easily altered by the protein environment. The key intermediates ofthe reaction were docked into the protein scaffold showing that some mutated residues have direct interaction with the cofactor and/or the co-substrate. These interactions take the form of a direct coordination of Rh in GSH and HFF and a strong hydrophobic patch with the carbene moiety in RFY. Posterior molecular dynamics sustain that the cofactor induces global re-arrangements of the protein. Finally, massive exploration of substrate diffusion, based on the GPathFinder approach, defines this event as the origin of the enantioselectivity in GSH and RFY. For HFF, fine molecular dockings suggest that it is likely related to local interactions upon diffusion. This work shows how modelling of long-range mutations on the catalytic profiles of metalloenzymes may be unavoidable and software simulating substrate diffusion should be applied

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
    • …
    corecore