168 research outputs found

    Global analysis of SNPs, proteins and protein-protein interactions: approaches for the prioritisation of candidate disease genes.

    Get PDF
    PhDUnderstanding the etiology of complex disease remains a challenge in biology. In recent years there has been an explosion in biological data, this study investigates machine learning and network analysis methods as tools to aid candidate disease gene prioritisation, specifically relating to hypertension and cardiovascular disease. This thesis comprises four sets of analyses: Firstly, non synonymous single nucleotide polymorphisms (nsSNPs) were analysed in terms of sequence and structure based properties using a classifier to provide a model for predicting deleterious nsSNPs. The degree of sequence conservation at the nsSNP position was found to be the single best attribute but other sequence and structural attributes in combination were also useful. Predictions for nsSNPs within Ensembl have been made publicly available. Secondly, predicting protein function for proteins with an absence of experimental data or lack of clear similarity to a sequence of known function was addressed. Protein domain attributes based on physicochemical and predicted structural characteristics of the sequence were used as input to classifiers for predicting membership of large and diverse protein superfamiles from the SCOP database. An enrichment method was investigated that involved adding domains to the training dataset that are currently absent from SCOP. This analysis resulted in improved classifier accuracy, optimised classifiers achieved 66.3% for single domain proteins and 55.6% when including domains from multi domain proteins. The domains from superfamilies with low sequence similarity, share global sequence properties enabling applications to be developed which compliment profile methods for detecting distant sequence relationships. Thirdly, a topological analysis of the human protein interactome was performed. The results were combined with functional annotation and sequence based properties to build models for predicting hypertension associated proteins. The study found that predicted hypertension related proteins are not generally associated with network hubs and do not exhibit high clustering coefficients. Despite this, they tend to be closer and better connected to other hypertension proteins on the interaction network than would be expected by chance. Classifiers that combined PPI network, amino acid sequence and functional properties produced a range of precision and recall scores according to the applied 3 weights. Finally, interactome properties of proteins implicated in cardiovascular disease and cancer were studied. The analysis quantified the influential (central) nature of each protein and defined characteristics of functional modules and pathways in which the disease proteins reside. Such proteins were found to be enriched 2 fold within proteins that are influential (p<0.05) in the interactome. Additionally, they cluster in large, complex, highly connected communities, acting as interfaces between multiple processes more often than expected. An approach to prioritising disease candidates based on this analysis was proposed. Each analyses can provide some new insights into the effort to identify novel disease related proteins for cardiovascular disease

    Computational and Experimental Approaches to Reveal the Effects of Single Nucleotide Polymorphisms with Respect to Disease Diagnostics

    Get PDF
    DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules

    Sequence and Structure Signatures of Cancer Mutation Hotspots in Protein Kinases

    Get PDF
    Protein kinases are the most common protein domains implicated in cancer, where somatically acquired mutations are known to be functionally linked to a variety of cancers. Resequencing studies of protein kinase coding regions have emphasized the importance of sequence and structure determinants of cancer-causing kinase mutations in understanding of the mutation-dependent activation process. We have developed an integrated bioinformatics resource, which consolidated and mapped all currently available information on genetic modifications in protein kinase genes with sequence, structure and functional data. The integration of diverse data types provided a convenient framework for kinome-wide study of sequence-based and structure-based signatures of cancer mutations. The database-driven analysis has revealed a differential enrichment of SNPs categories in functional regions of the kinase domain, demonstrating that a significant number of cancer mutations could fall at structurally equivalent positions (mutational hotspots) within the catalytic core. We have also found that structurally conserved mutational hotspots can be shared by multiple kinase genes and are often enriched by cancer driver mutations with high oncogenic activity. Structural modeling and energetic analysis of the mutational hotspots have suggested a common molecular mechanism of kinase activation by cancer mutations, and have allowed to reconcile the experimental data. According to a proposed mechanism, structural effect of kinase mutations with a high oncogenic potential may manifest in a significant destabilization of the autoinhibited kinase form, which is likely to drive tumorigenesis at some level. Structure-based functional annotation and prediction of cancer mutation effects in protein kinases can facilitate an understanding of the mutation-dependent activation process and inform experimental studies exploring molecular pathology of tumorigenesis

    Evolutionary and in silico analysis of the antiviral TRIM22 gene

    Get PDF
    Tripartite motif protein 22 (TRIM22) is an evolutionarily ancient interferon-induced protein that been shown to potently inhibit human immunodeficiency virus (HIV), hepatitis B virus (HBV), and influenza A virus (IAV) replication. Altered TRIM22 expression levels have also been linked to autoimmune disease, cancer, and cellular proliferation. Despite its important role in a number of biological processes, the factors that influence TRIM22 expression and/or antiviral activity remain largely unknown. To identify key functional sites in TRIM22, we performed extensive evolutionary and in silico analyses on the TRIM22 coding region. These tools allowed us to pinpoint multiple sites in TRIM22 that have evolved under positive selection during mammalian evolution, including one site that coincides with the location of a common non-synonymous SNP (nsSNP) in the human TRIM22 gene (TRIM22 rs1063303:G\u3eC). Remarkably, we found that the frequency of TRIM22 rs1063303:G\u3eC varied considerably among different ethnic populations and African (AFR), American (AMR), and European (EUR) populations contained an excess of intermediate frequency TRIM22 rs1063303:G\u3eC alleles when compared to a neutral model of evolution. The latter is typically indicative of balancing selection, a non-neutral selective process that maintains polymorphism in a population. Interestingly, we also found that the TRIM22 nsSNP rs1063303:G\u3eC had an inverse impact on TRIM22 function. TRIM22 rs1063303:G\u3eC increased TRIM22 expression levels, but decreased its anti-HIV activity and altered its subcellular localization pattern. In addition to these studies, we used a variety of in silico methods to prioritize and delineate other functional sites in TRIM22. We showed that the majority of positively selected sites in the C-terminal B30.2 domain of TRIM22 are located in one of four surface-exposed variable loops that are critical for the anti-HIV effects of the closely-related TRIM5α protein. Moreover, we used six different in silico nsSNP prediction programs to screen all of the nsSNPs in the TRIM22 gene and identified 14 high-risk nsSNPs that are predicted to be highly deleterious to TRIM22 function. Finally, to examine the TRIM22 nsSNP rs1063303:G\u3eC in a more isolated population, we genotyped this nsSNP in two Inuit populations (Canadian and Greenlandic Inuit). We found that the TRIM22 rs1063303:C allele is inordinately prevalent in the Inuit compared to non-Inuit populations and that these two populations do not contain an excess of intermediate frequency TRIM22 rs1063303:G\u3eC alleles compared to a neutral model of evolution, indicating that site TRIM22 rs1063303:G\u3eC has not evolved under balancing selection in the Inuit. Lastly, we found an interesting association between the TRIM22 rs1063303:C allele and serum levels of triglycerides (TG) and high-density lipoprotein (HDL). Taken together, the results presented here identify a number of pertinent sites in the TRIM22 protein that likely influence its biological and/or antiviral functions

    Studies on the relationship between single nucleotide polymorphisms and protein interactions

    No full text
    This thesis presents an analysis of the relationship between single nucleotide polymorphism (SNPs) and protein–protein interactions. The aim of the thesis is to investigate the distribution of non-synonymous single nucleotide polymorphism (nsSNPs) in terms of their locations in the protein core, at the protein–protein interface sites and on the other areas on the protein surface. The analysis used experimentally verified human protein–protein interactions and nsSNPs from the UniProt humsavar database. A further investigation was performed on a larger SNP dataset from the 1000 Genomes Project (1KGP). Both investigations identified a significant preference for disease-causing SNPs to occur at the protein interface compared to other areas on the protein surface. The three-dimensional structures of protein–protein interfaces were examined in order to propose stereo-chemical explanations for the disease-causing effect of nsSNPs in the humsavar dataset. In addition, three methodologies (i.e., usage of SNP server, structural analysis and usage of GMAF) that could help identify pathogenic variants were presented. Structural analysis was also performed on non-diseasecausing SNPs in order to investigate their possible effects on protein–protein interactions. The result showed that some of the previously classified non-diseasecausing SNPs could potentially be disease-causing SNPs. The myVARIANT program was developed. The program obtains SNPs from 1KGP, maps them to structures, evaluates their distribution on structures and performs a structural analysis. In conclusion, the thesis demonstrates the important role that protein–protein interactions play in disease pathogenesis.Open Acces

    DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases

    Get PDF
    BACKGROUND: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named domainRBF (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases. RESULTS: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn\u27s disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource. CONCLUSIONS: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases

    The Role of Mutations in Protein Structural Dynamics and Function: A Multi-scale Computational Approach

    Get PDF
    abstract: Proteins are a fundamental unit in biology. Although proteins have been extensively studied, there is still much to investigate. The mechanism by which proteins fold into their native state, how evolution shapes structural dynamics, and the dynamic mechanisms of many diseases are not well understood. In this thesis, protein folding is explored using a multi-scale modeling method including (i) geometric constraint based simulations that efficiently search for native like topologies and (ii) reservoir replica exchange molecular dynamics, which identify the low free energy structures and refines these structures toward the native conformation. A test set of eight proteins and three ancestral steroid receptor proteins are folded to 2.7Å all-atom RMSD from their experimental crystal structures. Protein evolution and disease associated mutations (DAMs) are most commonly studied by in silico multiple sequence alignment methods. Here, however, the structural dynamics are incorporated to give insight into the evolution of three ancestral proteins and the mechanism of several diseases in human ferritin protein. The differences in conformational dynamics of these evolutionary related, functionally diverged ancestral steroid receptor proteins are investigated by obtaining the most collective motion through essential dynamics. Strikingly, this analysis shows that evolutionary diverged proteins of the same family do not share the same dynamic subspace. Rather, those sharing the same function are simultaneously clustered together and distant from those functionally diverged homologs. This dynamics analysis also identifies 77% of mutations (functional and permissive) necessary to evolve new function. In silico methods for prediction of DAMs rely on differences in evolution rate due to purifying selection and therefore the accuracy of DAM prediction decreases at fast and slow evolvable sites. Here, we investigate structural dynamics through computing the contribution of each residue to the biologically relevant fluctuations and from this define a metric: the dynamic stability index (DSI). Using DSI we study the mechanism for three diseases observed in the human ferritin protein. The T30I and R40G DAMs show a loss of dynamic stability at the C-terminus helix and nearby regulatory loop, agreeing with experimental results implicating the same regulatory loop as a cause in cataracts syndrome.Dissertation/ThesisPh.D. Physics 201
    corecore