627 research outputs found

    Flexible protein folding by ant colony optimization

    Get PDF
    Protein structure prediction is one of the most challenging topics in bioinformatics. As the protein structure is found to be closely related to its functions, predicting the folding structure of a protein to judge its functions is meaningful to the humanity. This chapter proposes a flexible ant colony (FAC) algorithm for solving protein folding problems (PFPs) based on the hydrophobic-polar (HP) square lattice model. Different from the previous ant algorithms for PFPs, the pheromones in the proposed algorithm are placed on the arcs connecting adjacent squares in the lattice. Such pheromone placement model is similar to the one used in the traveling salesmen problems (TSPs), where pheromones are released on the arcs connecting the cities. Moreover, the collaboration of effective heuristic and pheromone strategies greatly enhances the performance of the algorithm so that the algorithm can achieve good results without local search methods. By testing some benchmark two-dimensional hydrophobic-polar (2D-HP) protein sequences, the performance shows that the proposed algorithm is quite competitive compared with some other well-known methods for solving the same protein folding problems

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Computational Intelligence for Life Sciences

    Get PDF
    Computational Intelligence (CI) is a computer science discipline encompassing the theory, design, development and application of biologically and linguistically derived computational paradigms. Traditionally, the main elements of CI are Evolutionary Computation, Swarm Intelligence, Fuzzy Logic, and Neural Networks. CI aims at proposing new algorithms able to solve complex computational problems by taking inspiration from natural phenomena. In an intriguing turn of events, these nature-inspired methods have been widely adopted to investigate a plethora of problems related to nature itself. In this paper we present a variety of CI methods applied to three problems in life sciences, highlighting their effectiveness: we describe how protein folding can be faced by exploiting Genetic Programming, the inference of haplotypes can be tackled using Genetic Algorithms, and the estimation of biochemical kinetic parameters can be performed by means of Swarm Intelligence. We show that CI methods can generate very high quality solutions, providing a sound methodology to solve complex optimization problems in life sciences

    Cooperative Particle Swarm Optimization for Combinatorial Problems

    Get PDF
    A particularly successful line of research for numerical optimization is the well-known computational paradigm particle swarm optimization (PSO). In the PSO framework, candidate solutions are represented as particles that have a position and a velocity in a multidimensional search space. The direct representation of a candidate solution as a point that flies through hyperspace (i.e., Rn) seems to strongly predispose the PSO toward continuous optimization. However, while some attempts have been made towards developing PSO algorithms for combinatorial problems, these techniques usually encode candidate solutions as permutations instead of points in search space and rely on additional local search algorithms. In this dissertation, I present extensions to PSO that by, incorporating a cooperative strategy, allow the PSO to solve combinatorial problems. The central hypothesis is that by allowing a set of particles, rather than one single particle, to represent a candidate solution, combinatorial problems can be solved by collectively constructing solutions. The cooperative strategy partitions the problem into components where each component is optimized by an individual particle. Particles move in continuous space and communicate through a feedback mechanism. This feedback mechanism guides them in the assessment of their individual contribution to the overall solution. Three new PSO-based algorithms are proposed. Shared-space CCPSO and multispace CCPSO provide two new cooperative strategies to split the combinatorial problem, and both models are tested on proven NP-hard problems. Multimodal CCPSO extends these combinatorial PSO algorithms to efficiently sample the search space in problems with multiple global optima. Shared-space CCPSO was evaluated on an abductive problem-solving task: the construction of parsimonious set of independent hypothesis in diagnostic problems with direct causal links between disorders and manifestations. Multi-space CCPSO was used to solve a protein structure prediction subproblem, sidechain packing. Both models are evaluated against the provable optimal solutions and results show that both proposed PSO algorithms are able to find optimal or near-optimal solutions. The exploratory ability of multimodal CCPSO is assessed by evaluating both the quality and diversity of the solutions obtained in a protein sequence design problem, a highly multimodal problem. These results provide evidence that extended PSO algorithms are capable of dealing with combinatorial problems without having to hybridize the PSO with other local search techniques or sacrifice the concept of particles moving throughout a continuous search space

    Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction

    Get PDF
    Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computed structures and identify the functions of these sequences. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, for a better basic understanding of aberrant states of stress and disease, including drug discovery and discovery of biomarkers. Several aspects of secondary structure predictions and other protein structure-related predictions are investigated using different types of information such as data obtained from knowledge-based potentials derived from amino acids in protein sequences, physicochemical properties of amino acids and propensities of amino acids to appear at the ends of secondary structures. Investigating the performance of these secondary structure predictions by type of amino acid highlights some interesting aspects relating to the influences of the individual amino acid types on formation of secondary structures and points toward ways to make further gains. Other research areas include Relative Solvent Accessibility (RSA) predictions and predictions of phosphorylation sites, which is one of the Post-Translational Modification (PTM) sites in proteins. Protein secondary structures and other features of proteins are predicted efficiently, reliably, less expensively and more accurately. A novel method called Fast Learning Optimized PREDiction (FLOPRED) Methodology is proposed for predicting protein secondary structures and other features, using knowledge-based potentials, a Neural Network based Extreme Learning Machine (ELM) and advanced Particle Swarm Optimization (PSO) techniques that yield better and faster convergence to produce more accurate results. These techniques yield superior classification of secondary structures, with a training accuracy of 93.33% and a testing accuracy of 92.24% with a standard deviation of 0.48% obtained for a small group of 84 proteins. We have a Matthew\u27s correlation-coefficient ranging between 80.58% and 84.30% for these secondary structures. Accuracies for individual amino acids range between 83% and 92% with an average standard deviation between 0.3% and 2.9% for the 20 amino acids. On a larger set of 415 proteins, we obtain a testing accuracy of 86.5% with a standard deviation of 1.38%. These results are significantly higher than those found in the literature. Prediction of protein secondary structure based on amino acid sequence is a common technique used to predict its 3-D structure. Additional information such as the biophysical properties of the amino acids can help improve the results of secondary structure prediction. A database of protein physicochemical properties is used as features to encode protein sequences and this data is used for secondary structure prediction using FLOPRED. Preliminary studies using a Genetic Algorithm (GA) for feature selection, Principal Component Analysis (PCA) for feature reduction and FLOPRED for classification give promising results. Some amino acids appear more often at the ends of secondary structures than others. A preliminary study has indicated that secondary structure accuracy can be improved as much as 6% by including these effects for those residues present at the ends of alpha-helix, beta-strand and coil. A study on RSA prediction using ELM shows large gains in processing speed compared to using support vector machines for classification. This indicates that ELM yields a distinct advantage in terms of processing speed and performance for RSA. Additional gains in accuracies are possible when the more advanced FLOPRED algorithm and PSO optimization are implemented. Phosphorylation is a post-translational modification on proteins often controls and regulates their activities. It is an important mechanism for regulation. Phosphorylated sites are known to be present often in intrinsically disordered regions of proteins lacking unique tertiary structures, and thus less information is available about the structures of phosphorylated sites. It is important to be able to computationally predict phosphorylation sites in protein sequences obtained from mass-scale sequencing of genomes. Phosphorylation sites may aid in the determination of the functions of a protein and to better understanding the mechanisms of protein functions in healthy and diseased states. FLOPRED is used to model and predict experimentally determined phosphorylation sites in protein sequences. Our new PSO optimization included in FLOPRED enable the prediction of phosphorylation sites with higher accuracy and with better generalization. Our preliminary studies on 984 sequences demonstrate that this model can predict phosphorylation sites with a training accuracy of 92.53% , a testing accuracy 91.42% and Matthew\u27s correlation coefficient of 83.9%. In summary, secondary structure prediction, Relative Solvent Accessibility and phosphorylation site prediction have been carried out on multiple sets of data, encoded with a variety of information drawn from proteins and the physicochemical properties of their constituent amino acids. Improved and efficient algorithms called S-ELM and FLOPRED, which are based on Neural Networks and Particle Swarm Optimization are used for classifying and predicting protein sequences. Analysis of the results of these studies provide new and interesting insights into the influence of amino acids on secondary structure prediction. S-ELM and FLOPRED have also proven to be robust and efficient for predicting relative solvent accessibility of proteins and phosphorylation sites. These studies show that our method is robust and resilient and can be applied for a variety of purposes. It can be expected to yield higher classification accuracy and better generalization performance compared to previous methods

    Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems

    Get PDF
    A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft

    REinforcement learning based Adaptive samPling: REAPing Rewards by Exploring Protein Conformational Landscapes

    Full text link
    One of the key limitations of Molecular Dynamics simulations is the computational intractability of sampling protein conformational landscapes associated with either large system size or long timescales. To overcome this bottleneck, we present the REinforcement learning based Adaptive samPling (REAP) algorithm that aims to efficiently sample conformational space by learning the relative importance of each reaction coordinate as it samples the landscape. To achieve this, the algorithm uses concepts from the field of reinforcement learning, a subset of machine learning, which rewards sampling along important degrees of freedom and disregards others that do not facilitate exploration or exploitation. We demonstrate the effectiveness of REAP by comparing the sampling to long continuous MD simulations and least-counts adaptive sampling on two model landscapes (L-shaped and circular), and realistic systems such as alanine dipeptide and Src kinase. In all four systems, the REAP algorithm consistently demonstrates its ability to explore conformational space faster than the other two methods when comparing the expected values of the landscape discovered for a given amount of time. The key advantage of REAP is on-the-fly estimation of the importance of collective variables, which makes it particularly useful for systems with limited structural information

    Predicting sequence and structural specificities of RNA binding regions recognized by splicing factor SRSF1

    Get PDF
    Abstract- RNA-binding proteins (RBPs) play diverse roles in eukaryotic RNA processing. Despite their pervasive functions in coding and non-coding RNA biogenesis and regulation, elucidating the specificities that define protein-RNA interactions remains a major challenge. Here, we describe a novel model-based approach — RNAMotifModeler to identify binding consensus of RBPs by integrating sequence features and RNA secondary structures. Using RNA sequences derived from Cross-linking immunoprecipitation (CLIP) followed by high-throughput sequencing for SRSF1 proteins, we identified a purine-rich octamer ‘AGAAGAAG ’ in a highly singlestranded RNA context, which is consistent with previous knowledge. The successful implementation on SRSF1 CLIPseq data demonstrates great potential to improve our understanding on the binding specificity of RNA binding proteins
    • …
    corecore