28 research outputs found

    IMPACT OF DELETERIOUS NON-SYNONYMOUS SINGLE NUCLEOTIDE POLYMORPHISMS OF CYTOKINE GENES ON NON-CLASSICAL HYDROGEN BONDS PREDISPOSING TO CARDIOVASCULAR DISEASE: AN IN SILICO APPROACH

    Get PDF
      Objective: Cardiovascular disease (CVD) is a leading cause of death worldwide. Malfunctioning of genes that are responsible for several inflammatory processes is the major cause for its initiation. Cytokine genes are one such group of genes that are involved in the development of CVD. Hence, the prediction of potential point mutations in these genes is important for diagnostic purposes. Such mutations result in altered protein structure and function when compared to neutral ones.Methods: In this study, interleukin factor 6 (IL6), tumor necrosis factor α (TNF-α), interleukin factor 4 (IL4), and interferon gamma have been analyzed using sorting intolerant from tolerant and PolyPhen 2.0 tools.Results: Several single nucleotide polymorphisms (SNPs), in IL6, TNF-α, and IL4, are found to be potentially deleterious. In addition, bond analysis has also been performed on these SNPs. It has been predicted that L119P and R196H of IL6 as well as K87T and T181N of TNF-α are potential ns-SNP's that may cause structural and functional variations in the corresponding proteins. The hydrogen and Cation-Pi bond analysis performed in this study provides molecular-based evidence that support the predicted deleterious potential of such SNPs for these CVD candidate genes along with other conventional in silico tools.Conclusion: The study testifies the importance of adopting a computational approach to narrow down potential point mutants for disease predictions

    Patterns of nucleotide diversity in Meisa1 and G3pdh in wild and cultivated cassava

    Get PDF
    The distribution and frequency of single nucleotide polymorphisms (snps) is an excellent tool for discerning evolutionary relatedness between cultivated and wild plant genomes. This type of information is scanty for the genus Manihot, and thus limiting systematic approaches in the genetic improvement of cassava. Here, we present a detailed description of the comparative patterns of snps in Isoamylase1 (Meisa1) and Glyceraldehyde-3-phosphate dehydrogenase (G3pdh) in 10 accessions of wild (Manihot esculenta subsp. flabellifolia) and 12 accessions of cultivated cassava (M. esculenta). The results show that Meisa1 is more variable in cultivated cassava than that in subspecies flabellifolia, where the 954 bp sequence region differs at 1 in 111 and 250 nucleotides of cultivated and wild species, respectively. Frequency analysis shows that snp occurs once every 42 bp in cultivated and every 70 bp in wild. Tajima’s D test statistics showed that Meisa1 has been evolving under different selection pressures, diversifying in cultivated and purifying in wild. G3pdh is under diversifying selection in both populations. This may indicate the importance for isoamylase1 in starch quality traits in cassava, a trait that is likely to have been the target for artificial selection by farmers and breeders, in addition to natural selection. This study also suggests that G3pdh may be a good marker for phylogeny study while Meisa1 may be useful for intra and inter-cultivar diversity studies. The non-synonymous snps that changed the amino acid property were identified and the potential implication of the change in protein function was analyzed and discussed

    Predicting disease-associated substitution of a single amino acid by analyzing residue interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues.</p> <p>Results</p> <p>We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively.</p> <p>Conclusions</p> <p>The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.</p

    Prediction of Deleterious Non-Synonymous SNPs Based on Protein Interaction Network and Hybrid Properties

    Get PDF
    Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies

    Single Nucleotide Polymorphisms (SNPs) in Exon 6 of Lecithin Cholesterol Acyltransferase (LCAT) Gene in Indonesian Local Sheep

    Get PDF
    Lecithin cholesterol acyltransferase (LCAT) is a soluble enzyme that converts cholesterol and lecithin to cholesteryl esters and lysolecithins on the surface of high density lipoprotein and plays an important role in lipoprotein metabolism. The research was aimed to explore single nucleotide polymorphisms of LCAT gene in Indonesian local sheep. A total of 118 genomic DNA of Indonesian local sheep were used in this research, consisted of Sumatera Thin Tail (43 heads), Garut (19 heads), Javanese Thin Tail (17 heads), Javanese Fat Tail (6 heads), Rote Island (7 heads), Kissar (7 heads), Sumbawa (10 heads), and Lembah Palu (9 heads). Polymerase chain reaction was used to amplify genomic DNA for exon 6 (250 bp) and direct sequencing method was used to identify polymorphism sequences. The sequences were analyzed with BioEdit and MEGA 5.2 software. The BLAST sequence was obtained from Gene Bank GQ 150556.1. The results showed three novel SNPs, i.e. c.742C&gt;T, c.770 T&gt;A and c.882C&gt;T. Substitution of cytosine to thymine c.742 is a synonymous mutation; thymine to adenine c.770 and cytosine to thymine c.882 are non-synonymous mutations. Polymorphisms of LCAT gene exon 6 was found in Sumatera Thin Tail, Javanese Thin Tail, Javanese Fat Tail, Garut, Lembah Palu, and Rote Island

    Cataloging Coding Sequence Variations in Human Genome Databases

    Get PDF
    BACKGROUND: With the recent growth of information on sequence variations in the human genome, predictions regarding the functional effects and relevance to disease phenotypes of coding sequence variations are becoming increasingly important. The aims of this study were to catalog protein-coding sequence variations (CVs) occurring in genetic variation databases and to use bioinformatic programs to analyze CVs. In addition, we aim to provide insight into the functionality of the reference databases. METHODOLOGY AND FINDINGS: To catalog CVs on a genome-wide scale with regard to protein function and disease, we investigated three representative databases; the Human Gene Mutation Database (HGMD), the Single Nucleotide Polymorphisms database (dbSNP), and the Haplotype Map (HapMap). Using these three databases, we analyzed CVs at the protein function level with bioinformatic programs. We proposed a combinatorial approach using the Support Vector Machine (SVM) to increase the performance of the prediction programs. By cataloging the coding sequence variations using these databases, we found that 4.36% of CVs from HGMD are concurrently registered in dbSNP (8.11% of CVs from dbSNP are concurrent in HGMD). The pattern of substitutions and functional consequences predicted by three bioinformatic programs was significantly different among concurrent CVs, and CVs occurring solely in HGMD or in dbSNP. The experimental results showed that the proposed SVM combination noticeably outperformed the individual prediction programs. CONCLUSIONS: This is the first study to compare human sequence variations in HGMD, dbSNP and HapMap at the genome-wide level. We found that a significant proportion of CVs in HGMD and dbSNP overlap, and we emphasize the need to use caution when interpreting the phenotypic relevance of these concurrent CVs. Combining bioinformatic programs can be helpful in predicting the functional consequences of CVs because it improved the performance of functional predictions

    KLASIFIKASI SPESIES BERDASARKAN DNA BARCODE SEQUENCE MENGGUNAKAN RANDOM FERNS

    Get PDF
    Machine learning telah diterapkan dalam berbagai domain, termasuk bioinformatika. Salah satu persoalan bioinformatika yang dapat diselesaikan dengan pendekatan machine learning adalah klasifikasi spesies. Penelitian ini berupaya mengklasifikasikan spesies ke dalam famili berdasarkan sekuens DNA barcode menggunakan pendekatan supervised learning dengan algoritma Random Ferns. Digunakan model komputasi dengan 13 tahapan, termasuk pengunduhan data, rangkaian praproses data, model training, prediksi, dan evaluasi. Gen ribulose-1,5-bisphosphate carboxylase-oxygenase large sub-unit (rbcL) yang merupakan salah satu lokus DNA barcode untuk tanaman, digunakan untuk merepresentasikan spesies dalam famili Amarilis dan Lili. Berdasarkan hasil eksperimen dengan 1.245 sekuens DNA training dan 220 sekuens testing menunjukkan bahwa Random Ferns dapat digunakan untuk mengklasifikasikan spesies ke dalam famili yang sesuai secara cepat dan akurat. Tercapai tingkat akurasi yang konsisten hingga 99,09% dalam waktu training selama 180ms dengan hanya menggunakan memori sebanyak 14,5MB. Perbandingan dengan algoritma Random Forest yang menjadi state-of-the-art menunjukkan Random Ferns dapat mencapai tingkat akurasi yang sepadan secara lebih efisien. Machine learning has been applied in various domains, including bioinformatics. One of the bioinformatics problems that can be solved by using a machine learning approach is species classification. This study attempts to classify species into families based on their DNA barcode sequences using supervised learning approach, i.e., the Random Ferns algorithm. A computational model consisting of 13 steps was proposed, including data retrieval, a series of data preprocessing, model training, prediction, and evaluation. The ribulose-1,5-bisphosphate carboxylase-oxygenase large sub-unit (rbcL) gene that has been selected as one of the DNA barcode loci for plants is used to represent species in the Amaryllidaceae and Liliaceae families. By using 1,245 DNA sequences for training and 220 sequences as testing data, the experiment results show that Random Ferns can be used to classify species sequences quickly and accurately into appropriate families based on their DNA barcode sequences. The trained model could achieve persistent accuracy result as high as 99,09% within 180ms of training time and using only 14,5 MB of memory. A comparison against the state-of-the-art Random Forest algorithm showed Random Ferns was able to achieve the same level of accuracy more efficiently

    Improving the prediction of disease-related variants using protein three-dimensional structure

    Get PDF
    Background: Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability. Non-synonymous SNPs occurring in coding regions result in single amino acid polymorphisms (SAPs) that may affect protein function and lead to pathology. Several methods attempt to estimate the impact of SAPs using different sources of information. Although sequence-based predictors have shown good performance, the quality of these predictions can be further improved by introducing new features derived from three-dimensional protein structures.Results: In this paper, we present a structure-based machine learning approach for predicting disease-related SAPs. We have trained a Support Vector Machine (SVM) on a set of 3,342 disease-related mutations and 1,644 neutral polymorphisms from 784 protein chains. We use SVM input features derived from the protein's sequence, structure, and function. After dataset balancing, the structure-based method (SVM-3D) reaches an overall accuracy of 85%, a correlation coefficient of 0.70, and an area under the receiving operating characteristic curve (AUC) of 0.92. When compared with a similar sequence-based predictor, SVM-3D results in an increase of the overall accuracy and AUC by 3%, and correlation coefficient by 0.06. The robustness of this improvement has been tested on different datasets and in all the cases SVM-3D performs better than previously developed methods even when compared with PolyPhen2, which explicitly considers in input protein structure information.Conclusion: This work demonstrates that structural information can increase the accuracy of disease-related SAPs identification. Our results also quantify the magnitude of improvement on a large dataset. This improvement is in agreement with previously observed results, where structure information enhanced the prediction of protein stability changes upon mutation. Although the structural information contained in the Protein Data Bank is limiting the application and the performance of our structure-based method, we expect that SVM-3D will result in higher accuracy when more structural date become available. \ua9 2011 Capriotti; licensee BioMed Central Ltd

    Interaction-based discovery of functionally important genes in cancers

    Get PDF
    A major challenge in cancer genomics is uncovering genes with an active role in tumorigenesis from a potentially large pool of mutated genes across patient samples. Here we focus on the interactions that proteins make with nucleic acids, small molecules, ions and peptides, and show that residues within proteins that are involved in these interactions are more frequently affected by mutations observed in large-scale cancer genomic data than are other residues. We leverage this observation to predict genes that play a functionally important role in cancers by introducing a computational pipeline (http://canbind.princeton.edu) for mapping large-scale cancer exome data across patients onto protein structures, and automatically extracting proteins with an enriched number of mutations affecting their nucleic acid, small molecule, ion or peptide binding sites. Using this computational approach, we show that many previously known genes implicated in cancers are enriched in mutations within the binding sites of their encoded proteins. By focusing on functionally relevant portions of proteins—specifically those known to be involved in molecular interactions—our approach is particularly well suited to detect infrequent mutations that may nonetheless be important in cancer, and should aid in expanding our functional understanding of the genomic landscape of cancer
    corecore