159 research outputs found

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    Pathway Bridge Based Multiobjective Optimization Approach for Lurking Pathway Prediction

    Get PDF

    Quantitative dynamic modeling of transcriptional networks of embryonic stem cells using integrated framework of Pareto optimality and energy balance

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 252-256).Embryonic Stem Cells (ESCs) are pluripotent and thus are considered the "cell type of choice". ESCs exhibit several phenotypic traits (e.g., proliferation, differentiation, apoptosis, necrosis, etc.) and when differentiated into a particular lineage they can perform an array of functions (e.g., protein secretion, detoxification, energy production). Typically, these cellular objectives compete against each other because of thermodynamic, stoichiometric and mass balance constraints. Analysis of transcriptional regulatory networks and metabolic networks in ESCs thus requires both a nonequilibrium thermodynamic and mass balance framework for designing and understanding complex ESC network approach as well as an optimality approach which can take cellular objectives into account simultaneously. The primary goal of this thesis was to develop an integrated energy and mass balance-based multi objective framework for a transcriptional regulatory network model for ESCs. The secondary goal was to utilize the developed framework for large-scale metabolic flux profiling of hepatic and ESC metabolic networks. Towards the first aim we first developed a complete dynamic pluripotent network model for ESCs which integrates several different master regulators of pluripotency such as transcription factors Oct4, Sox2, Nanog, Klf4, Nacl, Rexl, Daxl, cMyc, and Zfp281, and obtained the dynamic connectivity matrix between various pluripotency related gene promoters and transcription factors. The developed model fully describes the self-renewal state of embryonic stem cells.(cont.) Next, we developed a transcriptional network model framework for ESCs that incorporates multiobjective optimality-based energy balance analysis. This framework predicts cofactor occupancy, network architecture and feedback memory of ESCs based on energetic cost. The integrated nonequilibrium thermodynamics and multiobjective-optimality network analysis-based approach was further utilized to explain the significance of transcriptional motifs defined as small regulatory interaction patterns that regulate biological functions in highly interacting cellular networks. Our results yield evidence that dissipative energetics is the underlying criteria used during evolution for motif selection and that biological systems during transcription tend towards evolutionary selection of subgraphs which produces minimum specific heat dissipation, thereby explaining the frequency of some motifs. Significantly, the proposed energetic hypothesis uncovers a mechanism for environmental selection of motifs, provides explanation for topological generalization of subgraphs into complex networks and enables identification of new functionalities for rarely occurring motifs. Towards the secondary goal, we have developed a multiobjecive optimization-based approach that couples the normalized constraint with both energy and flux balance-based metabolic flux analysis to explain certain features of metabolic control of hepatocytes, which is relevant to the response of hepatocytes and liver to various physiological stimuli and disease states. We also utilized this approach to obtain an optimal regimen for ESC differentiation into hepatocytes.(cont.) The presented framework may establish multiobjective optimality-based thermodynamic analysis as a backbone in designing and understanding complex network systems, such as transcriptional, metabolic and protein interaction networks.by Marco A. Avila.Ph.D

    Protein Superfamily Classification using Computational Intelligence Techniques

    Get PDF
    The problem of protein superfamily classification is a challenging research area in Bioinformatics and has its major application in drug discovery. If a newly discovered protein which is responsible for the cause of new disease gets correctly classified to its superfamily, then the task of the drug analyst becomes much easier. The analyst can perform molecular docking to find the correct relative orientation of ligand for the protein. The ligand database can be searched for all possible orientations and conformations of the protein belonging to that superfamily paired with the ligand. Thus, the search space is reduced enormously as the protein-ligand pair is searched for a particular protein superfamily. Therefore, correct classification of proteins becomes a very challenging task as it guides the analysts to discover appropriate drugs. In this thesis, Neural Networks (NN), Multiobjective Genetic Algorithm (MOGA),and Support Vector Machine (SVM) are applied to perform the classification task.Adaptive MultiObjective Genetic Algorithm (AMOGA), which is a variation of MOGA is implemented for the structure optimization of Radial Basis Function Network (RBFN). The modification to MOGA is done based on the two key controlling parameters such as probability of crossover and probability of mutation. These values are adaptively varied based upon the performance of the algorithm, i.e., based upon the percentage of the total population present in the best non-domination level. The problem of finding the number of hidden centers remains a critical issue for the design of RBFN. The most optimal RBF network with good generalization ability can be derived from the pareto optimal set. Therefore, every solution of the pareto optimal set gives information regarding the specific samples to be chosen as hidden centers as well as the update weight matrix connecting the hidden and output layer. Principal Component Analysis (PCA) has been used for dimension reduction and significant feature extraction from long feature vector of amino acid sequences.In two-stage approach for protein superfamily classification, feature extraction process is carried in the first stage and design of the classifier has been proposed in the second stage with an overall objective to maximize the performance accuracy of the classifier. In the feature extraction phase, Genetic Algorithm(GA) based wrapper approach is used to select few eigen vectors from the PCA space which are encoded as binary strings in the chromosome. Using PCA-NSGA-II (non-dominated sorting GA), the non-dominated solutions obtained from the pareto front solves the trade-off problem by compromising between the number of eigen vectors selected and the accuracy obtained by the classifier. In the second stage, Recursive Orthogonal Least Square Algorithm (ROLSA) is used for training RBFN. ROLSA selects the optimal number o

    Graphics Processing Unit–Enhanced Genetic Algorithms for Solving the Temporal Dynamics of Gene Regulatory Networks

    Get PDF
    Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes—master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)—is carried out for this problem. Several procedures that optimize the use of the GPU’s resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs)
    corecore