183 research outputs found

    Evolutionary decision rules for predicting protein contact maps

    Get PDF
    Protein structure prediction is currently one of the main open challenges in Bioinformatics. The protein contact map is an useful, and commonly used, represen tation for protein 3D structure and represents binary proximities (contact or non-contact) between each pair of amino acids of a protein. In this work, we propose a multi objective evolutionary approach for contact map prediction based on physico-chemical properties of amino acids. The evolutionary algorithm produces a set of decision rules that identifies contacts between amino acids. The rules obtained by the algorithm impose a set of conditions based on amino acid properties to predict contacts. We present results obtained by our approach on four different protein data sets. A statistical study was also performed to extract valid conclusions from the set of prediction rules generated by our algorithm. Results obtained confirm the validity of our proposal

    Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach

    Get PDF
    Integral membrane proteins constitute 25–30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthew's correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp

    New evolutionary approaches to protein structure prediction

    Get PDF
    Programa de doctorado en Biotecnología y Tecnología QuímicaThe problem of Protein Structure Prediction (PSP) is one of the principal topics in Bioinformatics. Multiple approaches have been developed in order to predict the protein structure of a protein. Determining the three dimensional structure of proteins is necessary to understand the functions of molecular protein level. An useful, and commonly used, representation for protein 3D structure is the protein contact map, which represents binary proximities (contact or non-contact) between each pair of amino acids of a protein. This thesis work, includes a compilation of the soft computing techniques for the protein structure prediction problem (secondary and tertiary structures). A novel evolutionary secondary structure predictor is also widely described in this work. Results obtained confirm the validity of our proposal. Furthermore, we also propose a multi-objective evolutionary approach for contact map prediction based on physico-chemical properties of amino acids. The evolutionary algorithm produces a set of decision rules that identifies contacts between amino acids. The rules obtained by the algorithm impose a set of conditions based on amino acid properties in order to predict contacts. Results obtained by our approach on four different protein data sets are also presented. Finally, a statistical study was performed to extract valid conclusions from the set of prediction rules generated by our algorithm.Universidad Pablo de Olavide. Centro de Estudios de Postgrad

    Statistical approaches to the study of protein folding and energetics

    Get PDF
    The determination of protein structure and the exploration of protein folding landscapes are two of the key problems in computational biology. In order to address these challenges, both a protein model that accurately captures the physics of interest and an efficient sampling algorithm are required. The first part of this thesis documents the continued development of CRANKITE, a coarse-grained protein model, and its energy landscape exploration using nested sampling, a Bayesian sampling algorithm. We extend CRANKITE and optimize its parameters using a maximum likelihood approach. The efficiency of our procedure, using the contrastive divergence approximation, allows a large training set to be used, producing a model which is transferable to proteins not included in the training set. We develop an empirical Bayes model for the prediction of protein β-contacts, which are required inputs for CRANKITE. Our approach couples the constraints and prior knowledge associated with β-contacts to a maximum entropy-based statistic which predicts evolutionarily-related contacts. Nested sampling (NS) is a Bayesian algorithm shown to be efficient at sampling systems which exhibit a first-order phase transition. In this work we parallelize the algorithm and, for the first time, apply it to a biophysical system: small globular proteins modelled using CRANKITE. We generate energy landscape charts, which give a large-scale visualization of the protein folding landscape, and we compare the efficiency of NS to an alternative sampling technique, parallel tempering, when calculating the heat capacity of a short peptide. In the final part of the thesis we adapt the NS algorithm for use within a molecular dynamics framework and demonstrate the application of the algorithm by calculating the thermodynamics of allatom models of a small peptide, comparing results to the standard replica exchange approach. This adaptation will allow NS to be used with more realistic force fields in the future

    Predicción de estructuras de proteínas basada en vecinos más cercanos

    Get PDF
    Programa de Doctorado en Biotecnología y Tecnología QuímicaLas proteínas son las biomoléculas que tienen mayor diversidad estructural y desempeñan multitud de importantes funciones en todos los organismos vivos. Sin embargo, en la formación de las proteínas se producen anomalías que provocan o facilitan el desarrollo de importantes enfermedades como el cáncer o el Alzheimer, siendo de vital importancia el diseño de fármacos que permitan evitar sus desastrosas consecuencias. En dicho diseño de fármacos se precisa disponer de modelos estructurales de proteínas que, pese a que su secuencia es conocida, en la mayoría de los casos su estructura aún se ignora. Es por ello que la predicción de la estructura de una proteína a partir de su secuencia de aminoácidos resulta clave para la cura de este tipo de enfermedades. En la presente Tesis se ha analizado profundamente el estado del arte del problema de la predicción de la estructura terciaria y cuaternaria de una proteína, aportando diversos aspectos y puntos de vista de los métodos más actuales y relevantes presentes en la literatura. Por otra parte, se propone un método nuevo para la predicción de mapas de distancias que representan estructuras proteínicas mediante un esquema de vecinos más cercanos empleando propiedades físico-químicas de aminoácidos como entrada. Se ha realizado una exhaustiva experimentación y se han analizado los resultados desde varios puntos de vista y destacando diversos aspectos de interés. Finalmente, se ha aplicado la propuesta metodológica a dos grupos de proteínas de interés biológico: las proteínas de virus y de mitocondrias, obteniéndose resultados muy prometedores en ambos casos.Universidad Pablo de Olavide. Centro de Estudios de Postgrad

    Proceedings, MSVSCC 2014

    Get PDF
    Proceedings of the 8th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 17, 2014 at VMASC in Suffolk, Virginia

    Deep Model for Improved Operator Function State Assessment

    Get PDF
    A deep learning framework is presented for engagement assessment using EEG signals. Deep learning is a recently developed machine learning technique and has been applied to many applications. In this paper, we proposed a deep learning strategy for operator function state (OFS) assessment. Fifteen pilots participated in a flight simulation from Seattle to Chicago. During the four-hour simulation, EEG signals were recorded for each pilot. We labeled 20- minute data as engaged and disengaged to fine-tune the deep network and utilized the remaining vast amount of unlabeled data to initialize the network. The trained deep network was then used to assess if a pilot was engaged during the four-hour simulation

    A Features Analysis Tool For Assessing And Improving Computational Models In Structural Biology

    Get PDF
    The protein-folding problem is to predict, from a protein's amino acid sequence, its folded 3D conformation. State of the art computational models are complex collaboratively maintained prediction software. Like other complex software, they become brittle without support for testing and refactoring. Features analysis, a language of `scientific unit testing', is the visual and quantitative comparison of distributions of features (local geometric measures) sampled from ensembles of native and predicted conformations. To support features analysis I develop a features analysis tool--a modular database framework for extracting and managing sampled feature instance and an exploratory data analysis framework for rapidly comparing feature distributions. In supporting features analysis, the tool supports the creation, tuning, and assessment of computational models, improving protein prediction and design. I demonstrate the features analysis tool through 6 case studies with the Rosetta molecular modeling suite. The first three demonstrate the tool usage mechanics through constructing and checking models. The first evaluates bond angle restraint models when used with the Backrub local sampling heuristic. The second identifies and resolves energy function derivative discontinuities that frustrate gradient-based minimization. The third constructs a model for disulfide bonds. The second three demonstrate using the tool to evaluate and improve how models represent molecular structure. I focus on modeling H-bonds because of their geometric specificity and environmental dependence lead to complex feature distributions. The fourth case study develops a novel functional form for Sp2 acceptor H-bonds. The fifth fits parameters for a refined H-bond model. The sixth combines the refined model with an electrostatics model and harmonizes them with the rest of the energy function. Next, to facilitate assessing model improvements, I develop recovery tests that measure predictive accuracy by asking models to recover native conformations that have been partially randomized. Finally, to demonstrate that the features analysis and recovery test tools support improving protein prediction and design, I evaluated the refined H-bond model and electrostatics model with additional corrections from the Rosetta community. Based on positive results, I recommend a new standard energy function, which has been accepted by the Rosetta community as the largest systematic improvement in nearly a decade.Doctor of Philosoph
    corecore