8,840 research outputs found

    Statistical methods for clinical genome interpretation with specific application to inherited cardiac conditions

    Get PDF
    Background: While next-generation sequencing has enabled us to rapidly identify sequence variants, clinical application is limited by our ability to determine which rare variants impact disease risk. Aim: Developing computational methods to identify clinically important variants Methods and Results: (1) I built a disease-specific variant classifier for inherited cardiac conditions (ICCs), which outperforms genome-wide tools in a wide range of benchmarking. It discriminates pathogenic variants from benign variants with global accuracy improved by 4-24% over existing tools. Variants classified with >90% confidence are significantly associated with both disease status and clinical outcomes. (2) To better interpret missense variants, I examined evolutionarily equivalent residues across protein domain families, to identify positions intolerant of variations. Homologous residue constraint is a strong predictor of variant pathogenicity. It can identify a subset of de novo missense variants with comparable impact on developmental disorders as protein-truncating variants. Independent from existing approaches, it can also improve the prioritisation of disease-relevant gene for both developmental disorders and inherited hypertrophic cardiomyopathy. (3) TTN-truncating variants are known to cause dilated cardiomyopathy, but the effect of missense variants is poorly understood. Using the approach in (2), I studied the role of TTN missense variants on DCM. Our prioritised residues are enriched with known pathogenic variants, including the two known to cause DCM and others involved in skeletal myopathies. I also found a significant association between constrained variants of TTN I-set domains and DCM in a case-control burden test of Caucasian samples (OR=3.2, 95%CI=1.3-9.4). Within subsets of DCM, the association is replicated in alcoholic cardiomyopathy. (4) Finally, I also developed a tool to annotate 5’UTR variants creating or disrupting upstream open reading frames (uORF). Its utility is demonstrated to detect high-impact uORF-disturbing variants from ClinVar, gnomAD and Genomics England. Conclusion: These studies established broadly applicable methods and improved understanding of ICCs.Open Acces

    Modelling functional and structural impact of non-synonymous single nucleotide polymorphisms of the DQA1 gene of three Nigerian goat breeds

    Get PDF
    The DQA1 gene is a member of the highly polymorphic MHC class II locus that is responsible for the differences among individuals in immune response to infectious agents. In this study, the authors performed a comprehensive computational analysis of the functional and structural impact of non-synonymous or amino acid-changing single nucleotide polymorphisms (SNPs) (nsSNPs) that are deleterious to the DQA1 protein in Nigerian goats. A 310-bp fragment of exon 2 of the DQA1 gene was amplified and sequenced in 27 unrelated animals that are representative of three major Nigerian goat breeds (nine each of West African Dwarf, Red Sokoto, and Sahel of both sexes) using genomic DNA. Forty-two nsSNPs were identified from the alignment of the deduced amino acid sequences. Based on the PANTHER, PROVEAN and PolyPhen-2 algorithms, there was consensus in identifying the mutants I26D, E114V and V115F as being deleterious. Further, differences between the native and the mutant proteins in the subsequent molecular trajectory analysis (stabilizing and flexible residue composition, total grid energy, solvation energy, coulombic energy, solvent accessibility, and protein-protein interaction properties) revealed E114V and V115F to be highly deleterious. Combined mutational analysis comparing the amutant (I26D, E114V and V115F mutations collectively) with the native protein also showed changes that could affect protein function and structure. Further wet-lab confirmatory analysis in a pathological association study involving a larger population of goats is required at the DQA1 locus. This would lay a sound foundation for breeding disease-resistant individuals in the future. Keywords: Goats, in silico, mutants, protein, tropic

    The loss and gain of functional amino acid residues is a common mechanism causing human inherited disease

    Get PDF
    Elucidating the precise molecular events altered by disease-causing genetic variants represents a major challenge in translational bioinformatics. To this end, many studies have investigated the structural and functional impact of amino acid substitutions. Most of these studies were however limited in scope to either individual molecular functions or were concerned with functional effects (e.g. deleterious vs. neutral) without specifically considering possible molecular alterations. The recent growth of structural, molecular and genetic data presents an opportunity for more comprehensive studies to consider the structural environment of a residue of interest, to hypothesize specific molecular effects of sequence variants and to statistically associate these effects with genetic disease. In this study, we analyzed data sets of disease-causing and putatively neutral human variants mapped to protein 3D structures as part of a systematic study of the loss and gain of various types of functional attribute potentially underlying pathogenic molecular alterations. We first propose a formal model to assess probabilistically function-impacting variants. We then develop an array of structure-based functional residue predictors, evaluate their performance, and use them to quantify the impact of disease-causing amino acid substitutions on catalytic activity, metal binding, macromolecular binding, ligand binding, allosteric regulation and post-translational modifications. We show that our methodology generates actionable biological hypotheses for up to 41% of disease-causing genetic variants mapped to protein structures suggesting that it can be reliably used to guide experimental validation. Our results suggest that a significant fraction of disease-causing human variants mapping to protein structures are function-altering both in the presence and absence of stability disruption

    In-Depth Analysis of Zero-Length Crosslinking for Structural Mass Spectrometry

    Get PDF
    The completion of the Human Genome Project revealed the sequence identity of essentially every human protein. However, in most cases, amino acid sequences alone convey little implication on the protein static structures, its dynamic conformational changes, and most importantly, its functions. To fully understand the behaviors and properties of macromolecular complexes, solving their 3D structures is necessary and highly critical. Under this rationale, structural genomics collaborations were initiated aiming to determine high-resolution structures of as many proteins and protein folds as possible, relying mostly on X-ray crystallography and NMR spectroscopy. Yet, very large, highly flexible or disordered, and dynamic protein complexes can exceed the capabilities of these high-resolution techniques. Although computational molecular modeling can be utilized, such structures are highly speculative and often inaccurate unless supported by actual experimental data. Structural mass spectrometry recently emerged as an alternative method which can provide medium-resolution spatial information capable of complementing computational approaches, and are applicable to heterogeneous samples with potentially no limit on complex sizes. In particular, chemical crosslinking coupled with mass spectrometry, has recently received considerable interest. Most recent progress focused on developing crosslinkers with special properties such as enrichment tags, isotopic labeling sites, or MS-cleavable bonds along with accompanying data analysis strategies and software packages. These crosslinkers insert their spacer arm between proximal amino acid residues, greatly reducing the stringency of the derived distance constraints. In contrast, zero-length crosslinkers are crosslinks which do not add any extra atoms to the product crosslinked peptides, therefore providing the tightest possible spatial constraints but rendering enrichment and isotopic labeling strategies inapplicable. As a result, zero-length crosslinking received limited attention and no software tools have previously been specifically developed for it. In this thesis project, we developed a multi-tiered mass spectrometry data acquisition and computational data analysis strategy along with a dedicated software tool to enhance identification of zero-length crosslinks in complex samples. Label-free comparison and targeted high-resolution mass spectrometry were utilized to filter out the vast majority of non-crosslinked peptides and increase confidence of crosslink identification, compensating for the lack of enrichment techniques and characteristic MS patterns employed by non-zero-length crosslinking methods. Each step from mass spectrometer acquisition parameters to MS/MS spectra evaluation functions was optimized based on zero-length crosslinking datasets of proteins with known crystal structures. Our pipeline was then applied to probe structures and conformational changes of mini-spectrin, a 90 kDa recombinant protein that closely mimics erythrocyte spectrin\u27s dynamic dimer-tetramer equilibrium. Compared to previous analyses performed in our laboratory, the current strategy more than doubled the number of identified crosslinks and significantly reduced analysis time per experiment from months to just several days. Distance constraints derived from mini-spectrin crosslinks were used as inputs in subsequent homology modeling, allowing development of experimentally-verified medium-resolution structures for wild-type mini-spectrin tetramer and both wild-type and hereditary elliptocytosis (HE) mutant mini-spectrin dimers. The structure models, in combination with independent biophysical experiments, illustrated how such distal HE-related mutations destabilized spectrin dimer-tetramer equilibrium by simultaneously lowering thermal stability of tetramer and giving rise to a more-compact, more-stable closed dimer conformation

    Analyzing Effects of Naturally Occurring Missense Mutations

    Get PDF
    Single-point mutation in genome, for example, single-nucleotide polymorphism (SNP) or rare genetic mutation, is the change of a single nucleotide for another in the genome sequence. Some of them will produce an amino acid substitution in the corresponding protein sequence (missense mutations); others will not. This paper focuses on genetic mutations resulting in a change in the amino acid sequence of the corresponding protein and how to assess their effects on protein wild-type characteristics. The existing methods and approaches for predicting the effects of mutation on protein stability, structure, and dynamics are outlined and discussed with respect to their underlying principles. Available resources, either as stand-alone applications or webservers, are pointed out as well. It is emphasized that understanding the molecular mechanisms behind these effects due to these missense mutations is of critical importance for detecting disease-causing mutations. The paper provides several examples of the application of 3D structure-based methods to model the effects of protein stability and protein-protein interactions caused by missense mutations as well

    Modeling the Tripartite Role of Cyclin C in Cellular Stress Response Coordination

    Get PDF
    For normal cellular function, exogenous signals must be interpreted and careful coordination must take place to ensure desired fates are achieved. Mitochondria are key regulatory nodes of cellular fate, undergoing fission/fusion cycles depending on the needs of the cell, and help mediate cell death fates. The CKM or Cdk8 kinase module, is composed of cyclin C (CC), Cdk8, Med12/12L, and Med13/13L. The CKM controls RNA polymerase II, acting as a regulator of stress-response and growth-control genes. Following stress, CC translocates to the mitochondria and interacts with both fission and iRCD apoptotic mediators. We hypothesize that CC represents a key mediator, linking transcription to mitochondrial fission and RCD. A more in-depth analysis of the roles of CC and the protein interactions that mediate them encompasses the focus of this dissertation. To mediate individual functions, CC uses distinct binding partners. We revealed the presence two separable/discrete cyclin box domains. To determine the residues mediating these functions, rigid body protein-protein docking simulations were performed using human cyclin C, Drp1, and Bax. These analyses revealed specific residues which support distinct functions of the CB1 and CB2 domains. Results indicate that modeled Bax-interacting residues are concentrated to the first half of the CB2 domain, while Drp1-interacting residues span the entirety of the CB2 domain. Interestingly, we determined that CC contains a unique BH2-like domain, normally only found in Bcl-2 protein family members, which appears to mediate interactions with Bax. Results from human protein modeling simulations were then applied to yeast homologous proteins. As presented here, yeast studies have confirmed residues that mediate interaction between CC and fission machinery. The results support the model that CB1 and CB2 are distinct, mediating independent functionalities. We suggest a model that CC possesses three distinct interaction domains and acts to bridge fission and apoptotic machinery, either in a mutually exclusive or trimeric manner. In conclusion, CC is shown to mediate each of its unique functions through distinct interacting residues and interfaces. With CC implicated in many human disorders, this will serve as a tool to study disease pathogeneses and treatments, taking into account unique interfaces governing the tripartite functions

    A Review of Fifteen Years Developing Computational Tools to Study Protein Aggregation

    Get PDF
    The presence of insoluble protein deposits in tissues and organs is a hallmark of many human pathologies. In addition, the formation of protein aggregates is considered one of the main bottlenecks to producing protein-based therapeutics. Thus, there is a high interest in rationalizing and predicting protein aggregation. For almost two decades, our laboratory has been working to provide solutions for these needs. We have traditionally combined the core tenets of both bioinformatics and wet lab biophysics to develop algorithms and databases to study protein aggregation and its functional implications. Here, we review the computational toolbox developed by our lab, including programs for identifying sequential or structural aggregation-prone regions at the individual protein and proteome levels, engineering protein solubility, finding and evaluating prion-like domains, studying disorder-to-order protein transitions, or categorizing non-conventional amyloid regions of polar nature, among others. In perspective, the succession of the tools we describe illustrates how our understanding of the protein aggregation phenomenon has evolved over the last fifteen years

    Cancer signaling networks and their implications for personalized medicine

    Get PDF
    corecore