124 research outputs found

    MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

    Get PDF
    Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Multiobjective optimization in bioinformatics and computational biology

    Get PDF

    Evolutionary Algorithms

    Full text link
    Evolutionary algorithms (EAs) are population-based metaheuristics, originally inspired by aspects of natural evolution. Modern varieties incorporate a broad mixture of search mechanisms, and tend to blend inspiration from nature with pragmatic engineering concerns; however, all EAs essentially operate by maintaining a population of potential solutions and in some way artificially 'evolving' that population over time. Particularly well-known categories of EAs include genetic algorithms (GAs), Genetic Programming (GP), and Evolution Strategies (ES). EAs have proven very successful in practical applications, particularly those requiring solutions to combinatorial problems. EAs are highly flexible and can be configured to address any optimization task, without the requirements for reformulation and/or simplification that would be needed for other techniques. However, this flexibility goes hand in hand with a cost: the tailoring of an EA's configuration and parameters, so as to provide robust performance for a given class of tasks, is often a complex and time-consuming process. This tailoring process is one of the many ongoing research areas associated with EAs.Comment: To appear in R. Marti, P. Pardalos, and M. Resende, eds., Handbook of Heuristics, Springe

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    Metaheurísticas, optimización multiobjetivo y paralelismo para descubrir motifs en secuencias de ADN

    Get PDF
    La resolución de problemas complejos mediante técnicas evolutivas es uno de los aspectos más investigados en Informática. El objetivo principal de esta tesis doctoral es desarrollar nuevos algoritmos capaces de resolver estos problemas con el menor tiempo computacional posible, mejorando la calidad de los resultados obtenidos por los métodos ya existentes. Para ello, combinamos tres conceptos importantes: metaheurísticas, optimización multiobjetivo y paralelismo. Con este fin, primero buscamos un problema de optimización importante que aún no fuese resuelto de forma eficiente y encontramos el Problema del Descubrimiento de Motifs (PDM). El PDM tiene como objetivo descubrir pequeños patrones repetidos (motifs) en conjuntos de secuencias de ADN que puedan poseer cierto significado biológico. Para abordarlo, definimos una formulación multiobjetivo adecuada a los requerimientos del mundo real, implementamos un total de diez algoritmos de distinta naturaleza (población, trayectoria, inteligencia colectiva...), analizando aspectos como la capacidad de escalar y converger. Finalmente, diseñamos diversas técnicas paralelas, haciendo uso de entornos de programación como OpenMP y MPI, que tratan de combinar las propiedades de varias metaheurísticas en una única aplicación. Los resultados obtenidos son estudiados en detalle a través de la aplicación de numerosos test estadísticos, y las predicciones son comparadas con las descubiertas por un total de trece herramientas biológicas bien conocidas en la literatura. Las conclusiones obtenidas demuestran que la utilización de la optimización multiobjetivo en técnicas metaheurísticas favorece el descubrimiento de soluciones de calidad y que el paralelismo es útil para combinar las propiedades evolutivas de diferentes algoritmos.The resolution of complex problems by using evolutionary algorithms is one of the most researched issues in Computer Science. The main goal of this thesis is directly related with the development of new algorithms that can solve this kind of problems with the least possible computational time, improving the results achieved by the existing methods. To this end, we combine three important concepts: metaheuristics, multiobjective optimization, and parallelism. For doing this, we first look for a significant optimization problem that had not been solved in an efficient way and we find the Motif Discovery Problem (MDP). MDP aims to discover over-represented short patterns (motifs) in a set of DNA sequences that may have some biological significance. To address it, we defined a multiobjective formulation adjusted to the real-world biological requirements, we implemented a total of ten algorithms of different nature (population, trajectory, collective intelligence...), analyzing aspects such as the ability to scale and converge. Finally, we designed parallel techniques, by using parallel and distributed programming environments as OpenMP and MPI, which try to combine the properties of several metaheuristics in a single application. The obtained results are discussed in detail through numerous statistical tests, and the achieved predictions are compared with those discovered by a total of thirteen well-known biological tools. The drawn conclusions demonstrate that using multiobjective optimization in metaheuristic techniques favors the discovery of quality solutions, and that parallelism is useful for combining the properties of different evolutionary algorithms.Ministerio de Economía y Competitividad - FEDER (TIN2008-06491-C04-04; TIN2012-30685) Gobierno de Extremadura (GR10025-TIC015

    Protein Superfamily Classification using Computational Intelligence Techniques

    Get PDF
    The problem of protein superfamily classification is a challenging research area in Bioinformatics and has its major application in drug discovery. If a newly discovered protein which is responsible for the cause of new disease gets correctly classified to its superfamily, then the task of the drug analyst becomes much easier. The analyst can perform molecular docking to find the correct relative orientation of ligand for the protein. The ligand database can be searched for all possible orientations and conformations of the protein belonging to that superfamily paired with the ligand. Thus, the search space is reduced enormously as the protein-ligand pair is searched for a particular protein superfamily. Therefore, correct classification of proteins becomes a very challenging task as it guides the analysts to discover appropriate drugs. In this thesis, Neural Networks (NN), Multiobjective Genetic Algorithm (MOGA),and Support Vector Machine (SVM) are applied to perform the classification task.Adaptive MultiObjective Genetic Algorithm (AMOGA), which is a variation of MOGA is implemented for the structure optimization of Radial Basis Function Network (RBFN). The modification to MOGA is done based on the two key controlling parameters such as probability of crossover and probability of mutation. These values are adaptively varied based upon the performance of the algorithm, i.e., based upon the percentage of the total population present in the best non-domination level. The problem of finding the number of hidden centers remains a critical issue for the design of RBFN. The most optimal RBF network with good generalization ability can be derived from the pareto optimal set. Therefore, every solution of the pareto optimal set gives information regarding the specific samples to be chosen as hidden centers as well as the update weight matrix connecting the hidden and output layer. Principal Component Analysis (PCA) has been used for dimension reduction and significant feature extraction from long feature vector of amino acid sequences.In two-stage approach for protein superfamily classification, feature extraction process is carried in the first stage and design of the classifier has been proposed in the second stage with an overall objective to maximize the performance accuracy of the classifier. In the feature extraction phase, Genetic Algorithm(GA) based wrapper approach is used to select few eigen vectors from the PCA space which are encoded as binary strings in the chromosome. Using PCA-NSGA-II (non-dominated sorting GA), the non-dominated solutions obtained from the pareto front solves the trade-off problem by compromising between the number of eigen vectors selected and the accuracy obtained by the classifier. In the second stage, Recursive Orthogonal Least Square Algorithm (ROLSA) is used for training RBFN. ROLSA selects the optimal number o
    corecore