5,026 research outputs found

    A simple and fast heuristic for protein structure comparison

    Get PDF
    Background Protein structure comparison is a key problem in bioinformatics. There exist several methods for doing protein comparison, being the solution of the Maximum Contact Map Overlap problem (MAX-CMO) one of the alternatives available. Although this problem may be solved using exact algorithms, researchers require approximate algorithms that obtain good quality solutions using less computational resources than the formers. Results We propose a variable neighborhood search metaheuristic for solving MAX-CMO. We analyze this strategy in two aspects: 1) from an optimization point of view the strategy is tested on two different datasets, obtaining an error of 3.5%(over 2702 pairs) and 1.7% (over 161 pairs) with respect to optimal values; thus leading to high accurate solutions in a simpler and less expensive way than exact algorithms; 2) in terms of protein structure classification, we conduct experiments on three datasets and show that is feasible to detect structural similarities at SCOP's family and CATH's architecture levels using normalized overlap values. Some limitations and the role of normalization are outlined for doing classification at SCOP's fold level. Conclusion We designed, implemented and tested.a new tool for solving MAX-CMO, based on a well-known metaheuristic technique. The good balance between solution's quality and computational effort makes it a valuable tool. Moreover, to the best of our knowledge, this is the first time the MAX-CMO measure is tested at SCOP's fold and CATH's architecture levels with encouraging results. Software is available for download at http://modo.ugr.es/jrgonzalez/msvns4maxcmo webcite.This work is supported by Projects HeuriCosc TIN2005-08404-C04-01, HeuriCode TIN2005-08404-C04-03, both from the Spanish Ministry of Education and Science. JRG acknowledges financial support from Project TIC2002-04242-C03-02. Authors thank N. Krasnogor and ProCKSi project (BB/C511764/1) for their support

    ProCKSI: a decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information

    Get PDF
    Background: We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Similarity Metric (USM), the Maximum Contact Map Overlap (MaxCMO) of protein structures and other external methods such as the DaliLite and the TM-align methods, the Combinatorial Extension (CE) of the optimal path, and the FAST Align and Search Tool (FAST). Additionally, ProCKSI allows the user to upload a user-defined similarity matrix supplementing the methods mentioned, and computes a similarity consensus in order to provide a rich, integrated, multicriteria view of large datasets of protein structures. Results: We present ProCKSI's architecture and workflow describing its intuitive user interface, and show its potential on three distinct test-cases. In the first case, ProCKSI is used to evaluate the results of a previous CASP competition, assessing the similarity of proposed models for given targets where the structures could have a large deviation from one another. To perform this type of comparison reliably, we introduce a new consensus method. The second study deals with the verification of a classification scheme for protein kinases, originally derived by sequence comparison by Hanks and Hunter, but here we use a consensus similarity measure based on structures. In the third experiment using the Rost and Sander dataset (RS126), we investigate how a combination of different sets of similarity measures influences the quality and performance of ProCKSI's new consensus measure. ProCKSI performs well with all three datasets, showing its potential for complex, simultaneous multi-method assessment of structural similarity in large protein datasets. Furthermore, combining different similarity measures is usually more robust than relying on one single, unique measure. Conclusion: Based on a diverse set of similarity measures, ProCKSI computes a consensus similarity profile for the entire protein set. All results can be clustered, visualised, analysed and easily compared with each other through a simple and intuitive interface

    Protein Tertiary Model Assessment Using Granular Machine Learning Techniques

    Get PDF
    The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM. Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment. Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same

    Design and validation of structural health monitoring system based on bio-inspired algorithms

    Get PDF
    The need of ensure the proper performance of the structures in service has made of structural health monitoring (SHM) a priority research area. Researchers all around the world have focused efforts on the development of new ways to continuous monitoring the structures and analyze the data collected from the inspection process in order to provide information about the current state and avoid possible catastrophes. To perform an effective analysis of the data, the development of methodologies is crucial in order to assess the structures with a low computational cost and with a high reliability. These desirable features can be found in biological systems, and these can be emulated by means of computational systems. The use of bio-inspired algorithms is a recent approach that has demonstrated its effectiveness in data analysis in different areas. Since these algorithms are based in the emulation of biological systems that have demonstrated its effectiveness for several generations, it is possible to mimic the evolution process and its adaptability characteristics by using computational algorithms. Specially in pattern recognition, several algorithms have shown good performance. Some widely used examples are the neural networks, the fuzzy systems and the genetic algorithms. This thesis is concerned about the development of bio-inspired methodologies for structural damage detection and classification. This document is organized in five chapters. First, an overview of the problem statement, the objectives, general results, a brief theoretical background and the description of the different experimental setups are included in Chapter 1 (Introduction). Chapters 2 to 4 include the journal papers published by the author of this thesis. The discussion of the results, some conclusions and the future work can be found on Chapter 5. Finally, Appendix A includes other contributions such as a book chapter and some conference papers.La necesidad de asegurar el correcto funcionamiento de las estructuras en servicio ha hecho de la monitorización de la integridad estructural un área de gran interés. Investigadores en todas las partes del mundo centran sus esfuerzos en el desarrollo de nuevas formas de monitorización contínua de estructuras que permitan analizar e interpretar los datos recogidos durante el proceso de inspección con el objetivo de proveer información sobre el estado actual de la estructura y evitar posibles catástrofes. Para desarrollar un análisis efectivo de los datos, es necesario el desarrollo de metodologías para inspeccionar la estructura con un bajo coste computacional y alta fiabilidad. Estas características deseadas pueden ser encontradas en los sistemas biológicos y pueden ser emuladas mediante herramientas computacionales. El uso de algoritmos bio-inspirados es una reciente técnica que ha demostrado su efectividad en el análisis de datos en diferentes áreas. Dado que estos algoritmos se basan en la emulación de sistemas biológicos que han demostrado su efectividad a lo largo de muchas generaciones, es posible imitar el proceso de evolución y sus características de adaptabilidad al medio usando algoritmos computacionales. Esto es así, especialmente, en reconocimiento de patrones, donde muchos de estos algoritmos brindan excelentes resultados. Algunos ejemplos ampliamente usados son las redes neuronales, los sistemas fuzzy y los algoritmos genéticos. Esta tesis involucra el desarrollo de unas metodologías bio-inspiradas para la detección y clasificación de daños estructurales. El documento está organizado en cinco capítulos. En primer lugar, se incluye una descripción general del problema, los objetivos del trabajo, los resultados obtenidos, un breve marco conceptual y la descripción de los diferentes escenarios experimentales en el Capítulo 1 (Introducción). Los Capítulos 2 a 4 incluyen los artículos publicados en diferentes revistas indexadas. La revisión de los resultados, conclusiones y el trabajo futuro se encuentra en el Capítulo 5. Finalmente, el Anexo A incluye otras contribuciones tales como un capítulo de libro y algunos trabajos publicados en conferencias

    `The frozen accident' as an evolutionary adaptation: A rate distortion theory perspective on the dynamics and symmetries of genetic coding mechanisms

    Get PDF
    We survey some interpretations and related issues concerning the frozen hypothesis due to F. Crick and how it can be explained in terms of several natural mechanisms involving error correction codes, spin glasses, symmetry breaking and the characteristic robustness of genetic networks. The approach to most of these questions involves using elements of Shannon's rate distortion theory incorporating a semantic system which is meaningful for the relevant alphabets and vocabulary implemented in transmission of the genetic code. We apply the fundamental homology between information source uncertainty with the free energy density of a thermodynamical system with respect to transcriptional regulators and the communication channels of sequence/structure in proteins. This leads to the suggestion that the frozen accident may have been a type of evolutionary adaptation

    Computation in Complex Networks

    Get PDF
    Complex networks are one of the most challenging research focuses of disciplines, including physics, mathematics, biology, medicine, engineering, and computer science, among others. The interest in complex networks is increasingly growing, due to their ability to model several daily life systems, such as technology networks, the Internet, and communication, chemical, neural, social, political and financial networks. The Special Issue “Computation in Complex Networks" of Entropy offers a multidisciplinary view on how some complex systems behave, providing a collection of original and high-quality papers within the research fields of: • Community detection • Complex network modelling • Complex network analysis • Node classification • Information spreading and control • Network robustness • Social networks • Network medicin
    corecore