52 research outputs found

    Probabilistic grammatical model of protein language and its application to helix-helix contact site classification

    Get PDF
    BACKGROUND: Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. CONCLUSIONS: We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists

    Metastable Pores at the Onset of Constant-Current Electroporation

    Get PDF
    Single metastable nanopores, appearing before the actual electroporation under constant-current conditions, are used to characterize the onset of electroporation. Unlike the long-lived electropores typical of the current controlled methods, these pores survive for milliseconds and observing them is possible due to slow development of electroporation, provided by the gradual accumulation of charges on a planar membrane. Analysis of the metastable pore appearance frequency and lifetime shows the first introductory stage of electroporation. During this stage two species of metastable pores open, the majority of very low conductance that seem not fully developed as hydrophilic electropores. The experiments reveal that voltage value defines the electroporation onset while the current value affects the rate of electroporation. Membrane capacitance has a great impact on the membrane susceptibility to the pore appearance, related to its thickness and integrity. Pores of nonperfect membranes appear more easily, but they do not live any longer than others

    Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides

    Get PDF
    National audienceHidden Markov Models power many state-of-the-art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium and long-range residue-residue interactions. This requires an expressive power of at least context-free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. We have developed a probabilistic grammatical framework for problem-specific protein languages, which has been already successfully applied to recognition of ligand binding sites. The core of the model consists of a probabilistic context-free grammar (PCFG), automatically inferred by a genetic algorithm from only a generic set of expert-based rules and positive training sequences. Here, we show that the PCFG approach matches state-of-the-art performance in two other tasks: classification of transmembrane helix-helix pairs and recognition of amyloidogenic peptides. First, the framework was applied to produce grammar descriptors of four classes of transmembrane helix-helix contact sites. The highest performance of the classifiers reached AUC ROC of 0.70. Second, the analogous approach was used to distinguish between amyloidogenic and non-amyloidogenic protein fragments. It yielded good results whether these fragments were isolated or within an entire protein (AUC ROC up to 0.80). Finally, an attempt to model pairing amyloidogenic fragments resulted in classifiers reaching AUC ROC of 0.70. A significant feature of the PCFG method is that grammar rules and parse trees are human-readable, and thus could provide biologically meaningful information

    Tuberous sclerosis complex neuropathology requires glutamate-cysteine ligase

    Get PDF
    Introduction: Tuberous sclerosis complex (TSC) is a genetic disease resulting from mutation in TSC1 or TSC2 and subsequent hyperactivation of mammalian Target of Rapamycin (mTOR). Common TSC features include brain lesions, such as cortical tubers and subependymal giant cell astrocytomas (SEGAs). However, the current treatment with mTOR inhibitors has critical limitations. We aimed to identify new targets for TSC pharmacotherapy. Results: The results of our shRNA screen point to glutamate-cysteine ligase catalytic subunit (GCLC), a key enzyme in glutathione synthesis, as a contributor to TSC-related phenotype. GCLC inhibition increased cellular stress and reduced mTOR hyperactivity in TSC2-depleted neurons and SEGA-derived cells. Moreover, patients’ brain tubers showed elevated GCLC and stress markers expression. Finally, GCLC inhibition led to growth arrest and death of SEGA-derived cells. Conclusions: We describe GCLC as a part of redox adaptation in TSC, needed for overgrowth and survival of mutant cells, and provide a potential novel target for SEGA treatment. Electronic supplementary material The online version of this article (doi:10.1186/s40478-015-0225-z) contains supplementary material, which is available to authorized users

    Molecular EPISTOP, a comprehensive multi-omic analysis of blood from Tuberous Sclerosis Complex infants age birth to two years

    Get PDF
    We present a comprehensive multi-omic analysis of the EPISTOP prospective clinical trial of early intervention with vigabatrin for pre-symptomatic epilepsy treatment in Tuberous Sclerosis Complex (TSC), in which 93 infants with TSC were followed from birth to age 2 years, seeking biomarkers of epilepsy development. Vigabatrin had profound effects on many metabolites, increasing serum deoxycytidine monophosphate (dCMP) levels 52-fold. Most serum proteins and metabolites, and blood RNA species showed significant change with age. Thirty-nine proteins, metabolites, and genes showed significant differences between age-matched control and TSC infants. Six also showed a progressive difference in expression between control, TSC without epilepsy, and TSC with epilepsy groups. A multivariate approach using enrollment samples identified multiple 3-variable predictors of epilepsy, with the best having a positive predictive value of 0.987. This rich dataset will enable further discovery and analysis of developmental effects, and associations with seizure development in TSC.</p

    Racial differences in systemic sclerosis disease presentation: a European Scleroderma Trials and Research group study

    Get PDF
    Objectives. Racial factors play a significant role in SSc. We evaluated differences in SSc presentations between white patients (WP), Asian patients (AP) and black patients (BP) and analysed the effects of geographical locations.Methods. SSc characteristics of patients from the EUSTAR cohort were cross-sectionally compared across racial groups using survival and multiple logistic regression analyses.Results. The study included 9162 WP, 341 AP and 181 BP. AP developed the first non-RP feature faster than WP but slower than BP. AP were less frequently anti-centromere (ACA; odds ratio (OR) = 0.4, P &lt; 0.001) and more frequently anti-topoisomerase-I autoantibodies (ATA) positive (OR = 1.2, P = 0.068), while BP were less likely to be ACA and ATA positive than were WP [OR(ACA) = 0.3, P &lt; 0.001; OR(ATA) = 0.5, P = 0.020]. AP had less often (OR = 0.7, P = 0.06) and BP more often (OR = 2.7, P &lt; 0.001) diffuse skin involvement than had WP.AP and BP were more likely to have pulmonary hypertension [OR(AP) = 2.6, P &lt; 0.001; OR(BP) = 2.7, P = 0.03 vs WP] and a reduced forced vital capacity [OR(AP) = 2.5, P &lt; 0.001; OR(BP) = 2.4, P &lt; 0.004] than were WP. AP more often had an impaired diffusing capacity of the lung than had BP and WP [OR(AP vs BP) = 1.9, P = 0.038; OR(AP vs WP) = 2.4, P &lt; 0.001]. After RP onset, AP and BP had a higher hazard to die than had WP [hazard ratio (HR) (AP) = 1.6, P = 0.011; HR(BP) = 2.1, P &lt; 0.001].Conclusion. Compared with WP, and mostly independent of geographical location, AP have a faster and earlier disease onset with high prevalences of ATA, pulmonary hypertension and forced vital capacity impairment and higher mortality. BP had the fastest disease onset, a high prevalence of diffuse skin involvement and nominally the highest mortality

    Quality assessment of protein model-structures based on structural and functional similarities

    Get PDF
    BACKGROUND: Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. RESULTS: GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. CONCLUSIONS: The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models
    • 

    corecore