1,104 research outputs found

    Protein secondary structure prediction: Creating a meta-tool

    Get PDF
    Abstract only availableProtein structure prediction is a growing field of interest for a many varied reasons, owing not only to its obvious utility, but also the success that applying newer mathematical tools has garnered in recent years. Despite the intractability of determining optimal protein structure directly by finding a lowest-energy conformation among a huge amount of candidates, many heuristic methods have emerged that sacrifice some degree of accuracy for reasonable speed of execution. Through the use of numerical techniques such as neural networks(1), neural networks bolstered by position-specific scoring matrices generated by psi-blast(2), and k-nearest neighbor algorithms(3), the success rate of protein structure prediction has been increasing over the past decade and a half. Each of these tools has particular strengths and weaknesses. To address this and to improve prediction accuracy, we are constructing a three-part meta-tool that combines k-nearest neighbor methods, neural network methods, and hidden markov models to predict the secondary structure of proteins based on their position-specific scoring matrices. The results from each of the individual tools will be integrated and filtered to form a final prediction. This tool will be available on the web through a simple interface for those wishing to evaluate or utilize it. References: 1: Rost and Sander. Predictions of protein secondary structure at better than 70% Accuracy; J. Mol. Biol. (1993) 232, 584-599 2: Jones. Protein secondary structure prediction based on position-specific scoring matrices; J. Mol. Boil. (1999) 292, 195-202 3: Bondugula, Duzlevski, Xu. Profiles and fuzzy k-nearest neighbor algorithm for protein secondary structure prediction; (unpublished).NSF-REU Program in Biosystems Modeling and Analysi

    An Efficient Nearest Neighbor Method for Protein Contact Prediction

    Get PDF
    A variety of approaches for protein inter-residue contact pre diction have been developed in recent years. However, this problem is far from being solved yet. In this article, we present an efficient nearest neigh bor (NN) approach, called PKK-PCP, and an application for the protein inter-residue contact prediction. The great strength of using this approach is its adaptability to that problem. Furthermore, our method improves considerably the efficiency with regard to other NN approaches. Our NN-based method combines parallel execution with k-d tree as search algorithm. The input data used by our algorithm is based on structural features and physico-chemical properties of amino acids besides of evo lutionary information. Results obtained show better efficiency rates, in terms of time and memory consumption, than other similar approaches.Ministerio de EducaciĂłn y Ciencia TIN2011-28956-C02-0

    Machine learning methods for omics data integration

    Get PDF
    High-throughput technologies produce genome-scale transcriptomic and metabolomic (omics) datasets that allow for the system-level studies of complex biological processes. The limitation lies in the small number of samples versus the larger number of features represented in these datasets. Machine learning methods can help integrate these large-scale omics datasets and identify key features from each dataset. A novel class dependent feature selection method integrates the F statistic, maximum relevance binary particle swarm optimization (MRBPSO), and class dependent multi-category classification (CDMC) system. A set of highly differentially expressed genes are pre-selected using the F statistic as a filter for each dataset. MRBPSO and CDMC function as a wrapper to select desirable feature subsets for each class and classify the samples using those chosen class-dependent feature subsets. The results indicate that the class-dependent approaches can effectively identify unique biomarkers for each cancer type and improve classification accuracy compared to class independent feature selection methods. The integration of transcriptomics and metabolomics data is based on a classification framework. Compared to principal component analysis and non-negative matrix factorization based integration approaches, our proposed method achieves 20-30% higher prediction accuracies on Arabidopsis tissue development data. Metabolite-predictive genes and gene-predictive metabolites are selected from transcriptomic and metabolomic data respectively. The constructed gene-metabolite correlation network can infer the functions of unknown genes and metabolites. Tissue-specific genes and metabolites are identified by the class-dependent feature selection method. Evidence from subcellular locations, gene ontology, and biochemical pathways support the involvement of these entities in different developmental stages and tissues in Arabidopsis

    Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property

    Get PDF
    Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area

    KANDIDAT PESERTA LOMBA AKADEMIK DAN NON- AKADEMIK DENGAN METODE FUZZY K-NEAREST NEIGHBOR IN EVERY CLASS

    Get PDF
    Banyak sekolah yang memanfaatkan datadari laporan kegiatan akademik siswanya untuk lebih mengetahui dan memahami siswanya, maupun menyelesaikan permasalahan tertentu, seperti proses memilih kandidat peserta lomba baik akademik maupun non-akademik. Kesulitannya terdapat pada siswa mana saja yang tepat untuk mewakili atau memenuhi kriteria untuk mengikuti kegiatan tersebut, dikarenakan proses tersebut masih dilakukan dengan cara manual, sehingga membutuhkan waktu yang cukup lama dan dinilai kurang efisien. Dengan memanfaatkan dataakademik siswa, secara langsung dapat mengefisiensikan waktu dan tenaga. Teknik klasifikasi yang dapat digunakan dalam penelitian ini adalah metode Fuzzy K-Nearest Neighbor In Every Class (FK-NNC) yang merupakan modifikasi dari K-Nearest Neighbor (K-NN)dan Fuzzy K-Nearest Neighbor(FK-NN), yaitu cara klasifikasi yang sederhana, mudah dan cepat, maka diharapkan mendapatkan hasil yang lebih baik dari proses modifikasi dua metode tersebut dan dirasakan metode yang cukup tepat. Aplikasi Kapelo memilih kandidat peserta lomba akademik dan non-akademik dengan mengklasifikasikannya menjadi dua kelas, yaitu kelas mampu dan belum mampu. Hasil penelitian menunjukkan bahwa akurasi prediksi yang diberikan FK-NNC untuk data uji sebanyak 21 ke 63 data latih dan besarnya akurasi secara berurut untuk K=3, K=5 dan K=7 sebesar 9.52%, 23.81% dan 33.33

    A novel framework for protein structure prediction

    Get PDF
    The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file.Title from title screen of research.pdf file (viewed on March 23, 2009)Vita.Thesis (Ph.D.) University of Missouri-Columbia 2007.Proteins are one of the most important molecules in the life processes. The structure of a protein is essential in understanding the function of a protein at the molecular level. Due to rapid progress in sequencing technologies, the gap between the proteins whose structure is known and the proteins whose structure needs to be characterized is rapidly increasing. To address this problem, we are developing a novel framework to computationally predict many aspects of proteins like secondary structure, solvent accessibility, contact map and finally, the tertiary structure itself. We have applied various computational techniques including the fuzzy k-nearest neighbor algorithm, the multi-dimensional scaling method, and the least-squares minimization, in the structure predictions. Our framework uses the evolutionary information more effectively than traditional template based methods, while it has a better potential to utilize the information in PDB than the other evolutionary information based methods. Our methods show better performance in prediction accuracy and computational time than many other tools.Includes bibliographical reference

    Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity

    Get PDF
    The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level. Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism. From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable. In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems. Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis

    ART and ARTMAP Neural Networks for Applications: Self-Organizing Learning, Recognition, and Prediction

    Full text link
    ART and ARTMAP neural networks for adaptive recognition and prediction have been applied to a variety of problems. Applications include parts design retrieval at the Boeing Company, automatic mapping from remote sensing satellite measurements, medical database prediction, and robot vision. This chapter features a self-contained introduction to ART and ARTMAP dynamics and a complete algorithm for applications. Computational properties of these networks are illustrated by means of remote sensing and medical database examples. The basic ART and ARTMAP networks feature winner-take-all (WTA) competitive coding, which groups inputs into discrete recognition categories. WTA coding in these networks enables fast learning, that allows the network to encode important rare cases but that may lead to inefficient category proliferation with noisy training inputs. This problem is partially solved by ART-EMAP, which use WTA coding for learning but distributed category representations for test-set prediction. In medical database prediction problems, which often feature inconsistent training input predictions, the ARTMAP-IC network further improves ARTMAP performance with distributed prediction, category instance counting, and a new search algorithm. A recently developed family of ART models (dART and dARTMAP) retains stable coding, recognition, and prediction, but allows arbitrarily distributed category representation during learning as well as performance.National Science Foundation (IRI 94-01659, SBR 93-00633); Office of Naval Research (N00014-95-1-0409, N00014-95-0657

    Machine learning applications for the topology prediction of transmembrane beta-barrel proteins

    Get PDF
    The research topic for this PhD thesis focuses on the topology prediction of beta-barrel transmembrane proteins. Transmembrane proteins adopt various conformations that are about the functions that they provide. The two most predominant classes are alpha-helix bundles and beta-barrel transmembrane proteins. Alpha-helix proteins are present in larger numbers than beta-barrel transmembrane proteins in structure databases. Therefore, there is a need to find computational tools that can predict and detect the structure of beta-barrel transmembrane proteins. Transmembrane proteins are used for active transport across the membrane or signal transduction. Knowing the importance of their roles, it becomes essential to understand the structures of the proteins. Transmembrane proteins are also a significant focus for new drug discovery. Transmembrane beta-barrel proteins play critical roles in the translocation machinery, pore formation, membrane anchoring, and ion exchange. In bioinformatics, many years of research have been spent on the topology prediction of transmembrane alpha-helices. The efforts to TMB (transmembrane beta-barrel) proteins topology prediction have been overshadowed, and the prediction accuracy could be improved with further research. Various methodologies have been developed in the past to predict TMB proteins topology. Methods developed in the literature that are available include turn identification, hydrophobicity profiles, rule-based prediction, HMM (Hidden Markov model), ANN (Artificial Neural Networks), radial basis function networks, or combinations of methods. The use of cascading classifier has never been fully explored. This research presents and evaluates approaches such as ANN (Artificial Neural Networks), KNN (K-Nearest Neighbors, SVM (Support Vector Machines), and a novel approach to TMB topology prediction with the use of a cascading classifier. Computer simulations have been implemented in MATLAB, and the results have been evaluated. Data were collected from various datasets and pre-processed for each machine learning technique. A deep neural network was built with an input layer, hidden layers, and an output. Optimisation of the cascading classifier was mainly obtained by optimising each machine learning algorithm used and by starting using the parameters that gave the best results for each machine learning algorithm. The cascading classifier results show that the proposed methodology predicts transmembrane beta-barrel proteins topologies with high accuracy for randomly selected proteins. Using the cascading classifier approach, the best overall accuracy is 76.3%, with a precision of 0.831 and recall or probability of detection of 0.799 for TMB topology prediction. The accuracy of 76.3% is achieved using a two-layers cascading classifier. By constructing and using various machine-learning frameworks, systems were developed to analyse the TMB topologies with significant robustness. We have presented several experimental findings that may be useful for future research. Using the cascading classifier, we used a novel approach for the topology prediction of TMB proteins
    • …
    corecore