11,808 research outputs found
A topological approach for protein classification
Protein function and dynamics are closely related to its sequence and
structure. However prediction of protein function and dynamics from its
sequence and structure is still a fundamental challenge in molecular biology.
Protein classification, which is typically done through measuring the
similarity be- tween proteins based on protein sequence or physical
information, serves as a crucial step toward the understanding of protein
function and dynamics. Persistent homology is a new branch of algebraic
topology that has found its success in the topological data analysis in a
variety of disciplines, including molecular biology. The present work explores
the potential of using persistent homology as an indepen- dent tool for protein
classification. To this end, we propose a molecular topological fingerprint
based support vector machine (MTF-SVM) classifier. Specifically, we construct
machine learning feature vectors solely from protein topological fingerprints,
which are topological invariants generated during the filtration process. To
validate the present MTF-SVM approach, we consider four types of problems.
First, we study protein-drug binding by using the M2 channel protein of
influenza A virus. We achieve 96% accuracy in discriminating drug bound and
unbound M2 channels. Additionally, we examine the use of MTF-SVM for the
classification of hemoglobin molecules in their relaxed and taut forms and
obtain about 80% accuracy. The identification of all alpha, all beta, and
alpha-beta protein domains is carried out in our next study using 900 proteins.
We have found a 85% success in this identifica- tion. Finally, we apply the
present technique to 55 classification tasks of protein superfamilies over 1357
samples. An average accuracy of 82% is attained. The present study establishes
computational topology as an independent and effective alternative for protein
classification
Identification of disease-causing genes using microarray data mining and gene ontology
Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes.
Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results.
Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth.
Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers
Entropy of leukemia on multidimensional morphological and molecular landscapes
Leukemia epitomizes the class of highly complex diseases that new
technologies aim to tackle by using large sets of single-cell level
information. Achieving such goal depends critically not only on experimental
techniques but also on approaches to interpret the data. A most pressing issue
is to identify the salient quantitative features of the disease from the
resulting massive amounts of information. Here, I show that the entropies of
cell-population distributions on specific multidimensional molecular and
morphological landscapes provide a set of measures for the precise
characterization of normal and pathological states, such as those corresponding
to healthy individuals and acute myeloid leukemia (AML) patients. I provide a
systematic procedure to identify the specific landscapes and illustrate how,
applied to cell samples from peripheral blood and bone marrow aspirates, this
characterization accurately diagnoses AML from just flow cytometry data. The
methodology can generally be applied to other types of cell-populations and
establishes a straightforward link between the traditional statistical
thermodynamics methodology and biomedical applications.Comment: 15 pages, 4 figures, and supplementary informatio
Integrating protein structural dynamics and evolutionary analysis with Bio3D
Abstract
Background
Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution.
Results
Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case.
Conclusions
The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from
http://thegrantlab.org/bio3d/
.http://deepblue.lib.umich.edu/bitstream/2027.42/109747/1/12859_2014_Article_399.pd
Binding site matching in rational drug design: Algorithms and applications
© 2018 The Author(s) 2018. Published by Oxford University Press. All rights reserved. Interactions between proteins and small molecules are critical for biological functions. These interactions often occur in small cavities within protein structures, known as ligand-binding pockets. Understanding the physicochemical qualities of binding pockets is essential to improve not only our basic knowledge of biological systems, but also drug development procedures. In order to quantify similarities among pockets in terms of their geometries and chemical properties, either bound ligands can be compared to one another or binding sites can be matched directly. Both perspectives routinely take advantage of computational methods including various techniques to represent and compare small molecules as well as local protein structures. In this review, we survey 12 tools widely used to match pockets. These methods are divided into five categories based on the algorithm implemented to construct binding-site alignments. In addition to the comprehensive analysis of their algorithms, test sets and the performance of each method are described. We also discuss general pharmacological applications of computational pocket matching in drug repurposing, polypharmacology and side effects. Reflecting on the importance of these techniques in drug discovery, in the end, we elaborate on the development of more accurate meta-predictors, the incorporation of protein flexibility and the integration of powerful artificial intelligence technologies such as deep learning
Raman spectroscopic characterization and analysis of agricultural and biological systems
Technical progresses in the past two decades in instrumental design, laser and electronic technology, and computer-based data analysis have made Raman spectroscopy, a noninvasive, nondestructive optical molecular spectroscopic imaging technique, an attractive choice for analytical tasks. Raman spectroscopy provides chemical structural information at molecular level with minimal sample preparation in a quick, easy-to-operate and reproducible fashion. In recent years it has been applied more and more to the analysis and characterization of agricultural products and biological samples. This dissertation documents the innovative research in Raman spectroscopic characterization and analysis in both biomedical and agricultural systems that I have been working on throughout my PhD training.
The biomedical research conducted was focused on glaucoma. Glaucoma is a chronic neurodegenerative disease characterized by apoptosis of retinal ganglion cells and subsequent loss of visual function. Early detection of pathological changes and progression in glaucoma and other neuroretinal diseases, which is critical for the prevention of permanent structural damage and irreversible vision loss, remains a great challenge. In my research, the Raman spectra from canine retinal tissues were subjected to multivariate discriminant analysis with a support vector machine algorithm to differentiate disease tissues versus healthy tissues. The high classification accuracy suggests that Raman spectroscopic screening can be used for in vitro detection of glaucomatous changes in retinal tissue not only at late stage but also at early stage with high specificity.
To expand the scope of application of Raman analysis, it was also applied to characterize agricultural and food materials. More specifically, Raman spectroscopy was applied to analyze meat. Existing objective methods (e.g., mechanical stress/strain analysis, near infrared spectroscopy) to predict sensory attributes of pork in general do not yield satisfactory correlation to panel evaluations. Raman spectroscopic methodology was investigated in this study to evaluate and predict tenderness, juiciness and chewiness of fresh, uncooked pork loins from 169 pigs. The method developed in this thesis yielded good prediction of sensory attributes such as tenderness and chewiness, and it has the potential to become a rapid objective assay for tenderness and chewiness of pork products that may find practical applications in pork industry. In addition, a Raman spectroscopic screening method in conjunction with discriminant modeling was developed for rapid evaluation of boar taint level in pork. Through the research demonstrated in this dissertation, Raman spectroscopy has been shown to have great potential to address analytical needs in new fields with great potential for innovative applications
- …