128 research outputs found

    Establishing the precise evolutionary history of a gene improves prediction of disease-causing missense mutations

    Get PDF
    PURPOSE: Predicting the phenotypic effects of mutations has become an important application in clinical genetic diagnostics. Computational tools evaluate the behavior of the variant over evolutionary time and assume that variations seen during the course of evolution are probably benign in humans. However, current tools do not take into account orthologous/paralogous relationships. Paralogs have dramatically different roles in Mendelian diseases. For example, whereas inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease-causing and, moreover, are implicated in protection from coronary heart disease. METHODS: We identified major events in NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. We predicted whether an amino acid substitution affects protein function by reducing the organism’s fitness. RESULTS: Removing the paralogs and distant homologs improved the overall performance of categorizing disease-causing and benign amino acid substitutions. CONCLUSION: The results show that a thorough evolutionary analysis followed by identification of orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well. Genet Med 18 10, 1029–1036

    Statistical Guidance for Experimental Design and Data Analysis of Mutation Detection in Rare Monogenic Mendelian Diseases by Exome Sequencing

    Get PDF
    Recently, whole-genome sequencing, especially exome sequencing, has successfully led to the identification of causal mutations for rare monogenic Mendelian diseases. However, it is unclear whether this approach can be generalized and effectively applied to other Mendelian diseases with high locus heterogeneity. Moreover, the current exome sequencing approach has limitations such as false positive and false negative rates of mutation detection due to sequencing errors and other artifacts, but the impact of these limitations on experimental design has not been systematically analyzed. To address these questions, we present a statistical modeling framework to calculate the power, the probability of identifying truly disease-causing genes, under various inheritance models and experimental conditions, providing guidance for both proper experimental design and data analysis. Based on our model, we found that the exome sequencing approach is well-powered for mutation detection in recessive, but not dominant, Mendelian diseases with high locus heterogeneity. A disease gene responsible for as low as 5% of the disease population can be readily identified by sequencing just 200 unrelated patients. Based on these results, for identifying rare Mendelian disease genes, we propose that a viable approach is to combine, sequence, and analyze patients with the same disease together, leveraging the statistical framework presented in this work

    Cataloging Coding Sequence Variations in Human Genome Databases

    Get PDF
    BACKGROUND: With the recent growth of information on sequence variations in the human genome, predictions regarding the functional effects and relevance to disease phenotypes of coding sequence variations are becoming increasingly important. The aims of this study were to catalog protein-coding sequence variations (CVs) occurring in genetic variation databases and to use bioinformatic programs to analyze CVs. In addition, we aim to provide insight into the functionality of the reference databases. METHODOLOGY AND FINDINGS: To catalog CVs on a genome-wide scale with regard to protein function and disease, we investigated three representative databases; the Human Gene Mutation Database (HGMD), the Single Nucleotide Polymorphisms database (dbSNP), and the Haplotype Map (HapMap). Using these three databases, we analyzed CVs at the protein function level with bioinformatic programs. We proposed a combinatorial approach using the Support Vector Machine (SVM) to increase the performance of the prediction programs. By cataloging the coding sequence variations using these databases, we found that 4.36% of CVs from HGMD are concurrently registered in dbSNP (8.11% of CVs from dbSNP are concurrent in HGMD). The pattern of substitutions and functional consequences predicted by three bioinformatic programs was significantly different among concurrent CVs, and CVs occurring solely in HGMD or in dbSNP. The experimental results showed that the proposed SVM combination noticeably outperformed the individual prediction programs. CONCLUSIONS: This is the first study to compare human sequence variations in HGMD, dbSNP and HapMap at the genome-wide level. We found that a significant proportion of CVs in HGMD and dbSNP overlap, and we emphasize the need to use caution when interpreting the phenotypic relevance of these concurrent CVs. Combining bioinformatic programs can be helpful in predicting the functional consequences of CVs because it improved the performance of functional predictions

    Challenges in Whole Exome Sequencing: An Example from Hereditary Deafness

    Get PDF
    Whole exome sequencing provides unprecedented opportunities to identify causative DNA variants in rare Mendelian disorders. Finding the responsible mutation via traditional methods in families with hearing loss is difficult due to a high degree of genetic heterogeneity. In this study we combined autozygosity mapping and whole exome sequencing in a family with 3 affected children having nonsyndromic hearing loss born to consanguineous parents. Two novel missense homozygous variants, c.508C>A (p.H170N) in GIPC3 and c.1328C>T (p.T443M) in ZNF57, were identified in the same ∼6 Mb autozygous region on chromosome 19 in affected members of the family. Both variants co-segregated with the phenotype and were absent in 335 ethnicity-matched controls. Biallelic GIPC3 mutations have recently been reported to cause autosomal recessive nonsyndromic sensorineural hearing loss. Thus we conclude that the hearing loss in the family described in this report is caused by a novel missense mutation in GIPC3. Identified variant in GIPC3 had a low read depth, which was initially filtered out during the analysis leaving ZNF57 as the only potential causative gene. This study highlights some of the challenges in the analyses of whole exome data in the bid to establish the true causative variant in Mendelian disease

    Distribution and Effects of Nonsense Polymorphisms in Human Genes

    Get PDF
    BACKGROUND: A great amount of data has been accumulated on genetic variations in the human genome, but we still do not know much about how the genetic variations affect gene function. In particular, little is known about the distribution of nonsense polymorphisms in human genes despite their drastic effects on gene products. METHODOLOGY/PRINCIPAL FINDINGS: To detect polymorphisms affecting gene function, we analyzed all publicly available polymorphisms in a database for single nucleotide polymorphisms (dbSNP build 125) located in the exons of 36,712 known and predicted protein-coding genes that were defined in an annotation project of all human genes and transcripts (H-InvDB ver3.8). We found a total of 252,555 single nucleotide polymorphisms (SNPs) and 8,479 insertion and deletions in the representative transcripts in these genes. The SNPs located in ORFs include 40,484 synonymous and 53,754 nonsynonymous SNPs, and 1,258 SNPs that were predicted to be nonsense SNPs or read-through SNPs. We estimated the density of nonsense SNPs to be 0.85x10(-3) per site, which is lower than that of nonsynonymous SNPs (2.1x10(-3) per site). On average, nonsense SNPs were located 250 codons upstream of the original termination codon, with the substitution occurring most frequently at the first codon position. Of the nonsense SNPs, 581 were predicted to cause nonsense-mediated decay (NMD) of transcripts that would prevent translation. We found that nonsense SNPs causing NMD were more common in genes involving kinase activity and transport. The remaining 602 nonsense SNPs are predicted to produce truncated polypeptides, with an average truncation of 75 amino acids. In addition, 110 read-through SNPs at termination codons were detected. CONCLUSION/SIGNIFICANCE: Our comprehensive exploration of nonsense polymorphisms showed that nonsense SNPs exist at a lower density than nonsynonymous SNPs, suggesting that nonsense mutations have more severe effects than amino acid changes. The correspondence of nonsense SNPs to known pathological variants suggests that phenotypic effects of nonsense SNPs have been reported for only a small fraction of nonsense SNPs, and that nonsense SNPs causing NMD are more likely to be involved in phenotypic variations. These nonsense SNPs may include pathological variants that have not yet been reported. These data are available from Transcript View of H-InvDB and VarySysDB (http://h-invitational.jp/varygene/)

    Improved Detection of Rare Genetic Variants for Diseases

    Get PDF
    Technology advances have promoted gene-based sequencing studies with the aim of identifying rare mutations responsible for complex diseases. A complication in these types of association studies is that the vast majority of non-synonymous mutations are believed to be neutral to phenotypes. It is thus critical to distinguish potential causative variants from neutral variation before performing association tests. In this study, we used existing predicting algorithms to predict functional amino acid substitutions, and incorporated that information into association tests. Using simulations, we comprehensively studied the effects of several influential factors, including the sensitivity and specificity of functional variant predictions, number of variants, and proportion of causative variants, on the performance of association tests. Our results showed that incorporating information regarding functional variants obtained from existing prediction algorithms improves statistical power under certain conditions, particularly when the proportion of causative variants is moderate. The application of the proposed tests to a real sequencing study confirms our conclusions. Our work may help investigators who are planning to pursue gene-based sequencing studies

    Basal-like phenotype is not associated with patient survival in estrogen-receptor-negative breast cancers

    Get PDF
    INTRODUCTION: Basal-phenotype or basal-like breast cancers are characterized by basal epithelium cytokeratin (CK5/14/17) expression, negative estrogen receptor (ER) status and distinct gene expression signature. We studied the clinical and biological features of the basal-phenotype tumors determined by immunohistochemistry (IHC) and cDNA microarrays especially within the ER-negative subgroup. METHODS: IHC was used to evaluate the CK5/14 status of 445 stage II breast cancers. The gene expression signature of the CK5/14 immunopositive tumors was investigated within a subset (100) of the breast tumors (including 50 ER-negative tumors) with a cDNA microarray. Survival for basal-phenotype tumors as determined by CK5/14 IHC and gene expression signature was assessed. RESULTS: From the 375 analyzable tumor specimens, 48 (13%) were immunohistochemically positive for CK5/14. We found adverse distant disease-free survival for the CK5/14-positive tumors during the first years (3 years hazard ratio (HR) 2.23, 95% confidence interval (CI) 1.17 to 4.24, p = 0.01; 5 years HR 1.80, 95% CI 1.02 to 3.15, p = 0.04) but the significance was lost at the end of the follow-up period (10 years HR 1.43, 95% CI 0.84 to 2.43, p = 0.19). Gene expression profiles of immunohistochemically determined CK5/14-positive tumors within the ER-negative tumor group implicated 1,713 differently expressed genes (p < 0.05). Hierarchical clustering analysis with the top 500 of these genes formed one basal-like and a non-basal-like cluster also within the ER-negative tumor entity. A highly concordant classification could be constructed with a published gene set (Sorlie's intrinsic gene set, concordance 90%). Both gene sets identified a basal-like cluster that included most of the CK5/14-positive tumors, but also immunohistochemically CK5/14-negative tumors. Within the ER-negative tumor entity there was no survival difference between the non-basal and basal-like tumors as identified by immunohistochemical or gene-expression-based classification. CONCLUSION: Basal cytokeratin-positive tumors have a biologically distinct gene expression signature from other ER-negative tumors. Even if basal cytokeratin expression predicts early relapse among non-selected tumors, the clinical outcome of basal tumors is similar to non-basal ER-negative tumors. Immunohistochemically basal cytokeratin-positive tumors almost always belong to the basal-like gene expression profile, but this cluster also includes few basal cytokeratin-negative tumors

    Stability of domain structures in multi-domain proteins

    Get PDF
    Multi-domain proteins have many advantages with respect to stability and folding inside cells. Here we attempt to understand the intricate relationship between the domain-domain interactions and the stability of domains in isolation. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Stability of such folds to exist independently is optimized by evolution. Specific residue mutations in the sites equivalent to inter-domain interface enhance the overall solvation, thereby stabilizing these domain folds independently. A few naturally occurring variants at these sites alter communication between domains and affect stability leading to disease manifestation. Our analysis provides safe guidelines for mutagenesis which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR

    Investigating the Structural Impacts of I64T and P311S Mutations in APE1-DNA Complex: A Molecular Dynamics Approach

    Get PDF
    Elucidating the molecular dynamic behavior of Protein-DNA complex upon mutation is crucial in current genomics. Molecular dynamics approach reveals the changes on incorporation of variants that dictate the structure and function of Protein-DNA complexes. Deleterious mutations in APE1 protein modify the physicochemical property of amino acids that affect the protein stability and dynamic behavior. Further, these mutations disrupt the binding sites and prohibit the protein to form complexes with its interacting DNA.In this study, we developed a rapid and cost-effective method to analyze variants in APE1 gene that are associated with disease susceptibility and evaluated their impacts on APE1-DNA complex dynamic behavior. Initially, two different in silico approaches were used to identify deleterious variants in APE1 gene. Deleterious scores that overlap in these approaches were taken in concern and based on it, two nsSNPs with IDs rs61730854 (I64T) and rs1803120 (P311S) were taken further for structural analysis.Different parameters such as RMSD, RMSF, salt bridge, H-bonds and SASA applied in Molecular dynamic study reveals that predicted deleterious variants I64T and P311S alters the structure as well as affect the stability of APE1-DNA interacting functions. This study addresses such new methods for validating functional polymorphisms of human APE1 which is critically involved in causing deficit in repair capacity, which in turn leads to genetic instability and carcinogenesis
    corecore