11 research outputs found
Extensive disruption of protein interactions by genetic variants across the allele frequency spectrum in human populations
Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual’s genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations
INTERACTOME-SCALE INTERROGATIONS OF HUMAN GENOMIC VARIATION
Coding variants segregating in human populations are expected to be largely benign, with deleterious variation occurring principally at rare allele frequencies and limited to conserved genomic sites. The extent to which this deleterious variation burdens human genomes and the mechanisms by which these mutations exert their function, though, remains largely unexplored. To help address this gap, I have contributed towards the development of interactome-scale tools for interrogating missense variation in human disease as well as experimentally measured the impact of thousands of human missense variants on protein interactions and stability. The accumulation of these efforts have helped in characterizing molecular mechanisms of disease-associated mutations and have enabled new insights towards the extent to which functional variation segregates across different human populations. To begin, the development of a massively parallel, site-directed mutagenesis platform for cloning DNA variants, named Clone-seq, is discussed. A study of the impact of 204 disease-associated mutations on protein interactions and stability is then detailed to demonstrate the utility of Clone-seq in genomic studies. Next, an extensive study of >2,000 missense mutations is presented in which widespread protein interaction perturbations by both rare and common human population variants is unveiled. Disruptive variants were found to be enriched within conserved sites in the genome and occurred at increasingly higher rates as allele frequency decreased. Evidence suggesting that disruptive variants persist primarily in less essential regions of the genome is then presented followed by a demonstration of how shared interaction perturbation profiles between population variants and disease-associated mutations can be applied to identify candidate disease-associated mutations from sequencing data. Lastly, the development of an integrated computational and experimental platform for prioritizing de novo missense mutations in developmental disorders is discussed. While protein interaction perturbations represent only one of a multitude of ways in which DNA variants can alter cellular function, nonetheless, the genetic, protein interaction, and population-level insights presented here should represent an important step forward towards an improved understanding of the evolutionary forces that shape the human genome and protein function
SAAMBE-3D: Predicting Effect of Mutations on Protein–Protein Interactions
Maintaining wild type protein–protein interactions is essential for the normal function of cell and any mutation that alter their characteristics can cause disease. Therefore, the ability to correctly and quickly predict the effect of amino acid mutations is crucial for understanding disease effects and to be able to carry out genome-wide studies. Here, we report a new development of the SAAMBE method, SAAMBE-3D, which is a machine learning-based approach, resulting in accurate predictions and is extremely fast. It achieves the Pearson correlation coefficient ranging from 0.78 to 0.82 depending on the training protocol in benchmarking five-fold validation test against the SKEMPI v2.0 database and outperforms currently existing algorithms on various blind-tests. Furthermore, optimized and tested via five-fold cross-validation on the Cornell University dataset, the SAAMBE-3D achieves AUC of 1.0 and 0.96 on a homo and hereto-dimer test datasets. Another important feature of SAAMBE-3D is that it is very fast, it takes less than a fraction of a second to complete a prediction. SAAMBE-3D is available as a web server and as well as a stand-alone code, the last one being another important feature allowing other researchers to directly download the code and run it on their local computer. Combined all together, SAAMBE-3D is an accurate and fast software applicable for genome-wide studies to assess the effect of amino acid mutations on protein–protein interactions. The webserver and the stand-alone codes (SAAMBE-3D for predicting the change of binding free energy and SAAMBE-3D-DN for predicting if the mutation is disruptive or non-disruptive) are available
Elucidating common structural features of human pathogenic variations using large-scale atomic-resolution protein networks
With the rapid growth of structural genomics, numerous protein crystal structures have become available. However, the parallel increase in knowledge of the functional principles underlying biological processes, and more specifically the underlying molecular mechanisms of disease, has been less dramatic. This notwithstanding, the study of complex cellular networks has made possible the inference of protein functions on a large scale. Here, we combine the scale of network systems biology with the resolution of traditional structural biology to generate a large-scale atomic-resolution interactome-network comprising 3,398 interactions between 2,890 proteins with a well-defined interaction interface and interface residues for each interaction. Within the framework of this atomic-resolution network, we have explored the structural principles underlying variations causing human-inherited disease. We find that in-frame pathogenic variations are enriched at both the interface and in the interacting domain, suggesting that variations not only at interface “hot-spots,” but in the entire interacting domain can result in alterations of interactions. Further, the sites of pathogenic variations are closely related to the biophysical strength of the interactions they perturb. Finally, we show that biochemical alterations consequent to these variations are considerably more disruptive than evolutionary changes, with the most significant alterations at the protein interaction interface
Examples of disease mutations in different structural loci of protein-protein interactions and examples of our GFP assay results.
<p>(a) Crystal structure (PDB id: 3W4U) depicting a D100Y mutation (on Hbb) at an interface residue and a F104L mutation in the interface domain for the Hbb-Hbz interaction. (b) Crystal structure (PDB id: 1G3N) depicting a V31L mutation (on Cdkn2c) away from the Cdkn2c-Cdk6 interaction interface. (c) GFP assays that determine the stability of wild-type Rrm2b and the R41P and L317V mutations on Rrm2b that are at an interface residue and away from the interface for the Rrm2b-Rrm2b interaction; GFP assays that determine the stability of wild-type Hprt1 and the C206Y mutation on Hprt1 that is away from the interaction interface of Hprt-Hprt1. Empty vector was used as a negative control.</p
Effect of disease mutations on protein stability and protein-protein interactions.
<p>(a) Western blotting with anti-GFP antibody confirming the protein expression levels of wild-type Rrm2b, Actn2, Hprt1, Pnp, Tpk1, Gnmt, Gale, Fbp1, Klhl3, Tp53, Pnp, Smad4, and corresponding mutant alleles. β-tubulin and γ-tubulin were used as loading controls. Red denotes “interface residue” mutations, orange denotes “interface domain” mutations and blue denotes “away from the interface” mutations. (b) Likelihood of disruption of interactions by “interface residue”, “interface domain” and “away from the interface” mutations – overall and for stable mutants only; likelihood of a disease mutation disrupting a given interaction in the absence of structural information. Error bars indicate +SE. (<i>N</i> = 204 mutations).</p
Identifying interactions of Mlh1 that are affected by the I107R mutation using SILAC-based mass spectrometry.
<p>(a) Schematic illustrating criteria used to identify interactions that are lost/weakened, unchanged, and gained/enhanced due to the I107R mutation on Mlh1. Blue denotes samples cultured in light media and black denotes samples cultured in heavy media. (b) Scatter plot illustrating fold change (<i>FC</i>; log scale) in the amount of protein pulled down by wild-type Mlh1 and mutant Mlh1 (I107R). Values are computed based on the wild-type (heavy) vs. mutant (light) (X-axis) and mutant (heavy) vs. wild-type (light) (Y-axis) experiments. Green denotes enhancement of interaction, red denotes weakening of interaction, and gold denotes no change. Mlh1 is shown in grey. (c) Fold changes and read counts (<i>r</i>) for interactors of Mlh1 that can be reliably identified as weakened, unchanged, and enhanced due to the I107R mutation. (d) Anti-HA immunoprecipitation followed by Western blotting with anti-V5 antibody confirming that the Mlh1-Brip1 interaction remains unchanged and that the Mlh1-Hspa8 interaction is dramatically enhanced due to the I107R mutation.</p
Relationships between molecular phenotypes and disease phenotypes.
<p>(a) Fraction of mutation pairs on the same gene that cause the same disease: for the same and different effects on protein stability. (b) Fraction of mutation pairs on the same gene that cause the same disease: for the same and different interaction disruption profiles. Error bars indicate +SE. (c) Crystal structure (PDB id: 1U7F) depicting the Y353S and R361C mutations (on Smad4) at interface residues for the Smad4-Smad3 interaction. (d) Y2H analysis of the effects of Smad Y353S, R361, and N13S mutations on its interactions with Smad3, Lmo4, Rassf5, and Smad9. Western blotting with anti-GFP antibody confirming the protein expression levels of wild-type Smad4 and its 3 mutant alleles – Y353S, R361C and N13S. γ-tubulin was used as a loading control.</p
Schematic of our comparative interactome-scanning pipeline.
<p>Our pipeline begins with Clone-seq (a), a massively-parallel low-cost site-directed mutagenesis pipeline leveraging next-generation sequencing. This is followed by a high-throughput GFP assay (b) to determine protein stability, and a high-throughput Y2H assay (c), along with SILAC-based mass spectrometry (d) to determine the impact of DNA coding variants on protein interactions.</p
A Massively Parallel Pipeline to Clone DNA Variants and Examine Molecular Phenotypes of Human Disease Mutations
<div><p>Understanding the functional relevance of DNA variants is essential for all exome and genome sequencing projects. However, current mutagenesis cloning protocols require Sanger sequencing, and thus are prohibitively costly and labor-intensive. We describe a massively-parallel site-directed mutagenesis approach, “Clone-seq”, leveraging next-generation sequencing to rapidly and cost-effectively generate a large number of mutant alleles. Using Clone-seq, we further develop a comparative interactome-scanning pipeline integrating high-throughput GFP, yeast two-hybrid (Y2H), and mass spectrometry assays to systematically evaluate the functional impact of mutations on protein stability and interactions. We use this pipeline to show that disease mutations on protein-protein interaction interfaces are significantly more likely than those away from interfaces to disrupt corresponding interactions. We also find that mutation pairs with similar molecular phenotypes in terms of both protein stability and interactions are significantly more likely to cause the same disease than those with different molecular phenotypes, validating the <i>in vivo</i> biological relevance of our high-throughput GFP and Y2H assays, and indicating that both assays can be used to determine candidate disease mutations in the future. The general scheme of our experimental pipeline can be readily expanded to other types of interactome-mapping methods to comprehensively evaluate the functional relevance of all DNA variants, including those in non-coding regions.</p></div