11,808 research outputs found

    A topological approach for protein classification

    Full text link
    Protein function and dynamics are closely related to its sequence and structure. However prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity be- tween proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an indepen- dent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically, we construct machine learning feature vectors solely from protein topological fingerprints, which are topological invariants generated during the filtration process. To validate the present MTF-SVM approach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Additionally, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. The identification of all alpha, all beta, and alpha-beta protein domains is carried out in our next study using 900 proteins. We have found a 85% success in this identifica- tion. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples. An average accuracy of 82% is attained. The present study establishes computational topology as an independent and effective alternative for protein classification

    Identification of disease-causing genes using microarray data mining and gene ontology

    Get PDF
    Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

    Entropy of leukemia on multidimensional morphological and molecular landscapes

    Full text link
    Leukemia epitomizes the class of highly complex diseases that new technologies aim to tackle by using large sets of single-cell level information. Achieving such goal depends critically not only on experimental techniques but also on approaches to interpret the data. A most pressing issue is to identify the salient quantitative features of the disease from the resulting massive amounts of information. Here, I show that the entropies of cell-population distributions on specific multidimensional molecular and morphological landscapes provide a set of measures for the precise characterization of normal and pathological states, such as those corresponding to healthy individuals and acute myeloid leukemia (AML) patients. I provide a systematic procedure to identify the specific landscapes and illustrate how, applied to cell samples from peripheral blood and bone marrow aspirates, this characterization accurately diagnoses AML from just flow cytometry data. The methodology can generally be applied to other types of cell-populations and establishes a straightforward link between the traditional statistical thermodynamics methodology and biomedical applications.Comment: 15 pages, 4 figures, and supplementary informatio

    Integrating protein structural dynamics and evolutionary analysis with Bio3D

    Get PDF
    Abstract Background Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Results Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. Conclusions The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .http://deepblue.lib.umich.edu/bitstream/2027.42/109747/1/12859_2014_Article_399.pd

    Binding site matching in rational drug design: Algorithms and applications

    Get PDF
    © 2018 The Author(s) 2018. Published by Oxford University Press. All rights reserved. Interactions between proteins and small molecules are critical for biological functions. These interactions often occur in small cavities within protein structures, known as ligand-binding pockets. Understanding the physicochemical qualities of binding pockets is essential to improve not only our basic knowledge of biological systems, but also drug development procedures. In order to quantify similarities among pockets in terms of their geometries and chemical properties, either bound ligands can be compared to one another or binding sites can be matched directly. Both perspectives routinely take advantage of computational methods including various techniques to represent and compare small molecules as well as local protein structures. In this review, we survey 12 tools widely used to match pockets. These methods are divided into five categories based on the algorithm implemented to construct binding-site alignments. In addition to the comprehensive analysis of their algorithms, test sets and the performance of each method are described. We also discuss general pharmacological applications of computational pocket matching in drug repurposing, polypharmacology and side effects. Reflecting on the importance of these techniques in drug discovery, in the end, we elaborate on the development of more accurate meta-predictors, the incorporation of protein flexibility and the integration of powerful artificial intelligence technologies such as deep learning

    Raman spectroscopic characterization and analysis of agricultural and biological systems

    Get PDF
    Technical progresses in the past two decades in instrumental design, laser and electronic technology, and computer-based data analysis have made Raman spectroscopy, a noninvasive, nondestructive optical molecular spectroscopic imaging technique, an attractive choice for analytical tasks. Raman spectroscopy provides chemical structural information at molecular level with minimal sample preparation in a quick, easy-to-operate and reproducible fashion. In recent years it has been applied more and more to the analysis and characterization of agricultural products and biological samples. This dissertation documents the innovative research in Raman spectroscopic characterization and analysis in both biomedical and agricultural systems that I have been working on throughout my PhD training. The biomedical research conducted was focused on glaucoma. Glaucoma is a chronic neurodegenerative disease characterized by apoptosis of retinal ganglion cells and subsequent loss of visual function. Early detection of pathological changes and progression in glaucoma and other neuroretinal diseases, which is critical for the prevention of permanent structural damage and irreversible vision loss, remains a great challenge. In my research, the Raman spectra from canine retinal tissues were subjected to multivariate discriminant analysis with a support vector machine algorithm to differentiate disease tissues versus healthy tissues. The high classification accuracy suggests that Raman spectroscopic screening can be used for in vitro detection of glaucomatous changes in retinal tissue not only at late stage but also at early stage with high specificity. To expand the scope of application of Raman analysis, it was also applied to characterize agricultural and food materials. More specifically, Raman spectroscopy was applied to analyze meat. Existing objective methods (e.g., mechanical stress/strain analysis, near infrared spectroscopy) to predict sensory attributes of pork in general do not yield satisfactory correlation to panel evaluations. Raman spectroscopic methodology was investigated in this study to evaluate and predict tenderness, juiciness and chewiness of fresh, uncooked pork loins from 169 pigs. The method developed in this thesis yielded good prediction of sensory attributes such as tenderness and chewiness, and it has the potential to become a rapid objective assay for tenderness and chewiness of pork products that may find practical applications in pork industry. In addition, a Raman spectroscopic screening method in conjunction with discriminant modeling was developed for rapid evaluation of boar taint level in pork. Through the research demonstrated in this dissertation, Raman spectroscopy has been shown to have great potential to address analytical needs in new fields with great potential for innovative applications
    corecore