559 research outputs found

    DNA Computing: Modelling in Formal Languages and Combinatorics on Words, and Complexity Estimation

    Get PDF
    DNA computing, an essential area of unconventional computing research, encodes problems using DNA molecules and solves them using biological processes. This thesis contributes to the theoretical research in DNA computing by modelling biological processes as computations and by studying formal language and combinatorics on words concepts motivated by DNA processes. It also contributes to the experimental research in DNA computing by a scaling comparison between DNA computing and other models of computation. First, for theoretical DNA computing research, we propose a new word operation inspired by a DNA wet lab protocol called cross-pairing polymerase chain reaction (XPCR). We define and study a word operation called word blending that models and generalizes an unexpected outcome of XPCR. The input words are uwx and ywv that share a non-empty overlap w, and the output is the word uwv. Closure properties of the Chomsky families of languages under this operation and its iterated version, the existence of a solution to equations involving this operation, and its state complexity are studied. To follow the XPCR experimental requirement closely, a new word operation called conjugate word blending is defined, where the subwords x and y are required to be identical. Closure properties of the Chomsky families of languages under this operation and the XPCR experiments that motivate and implement it are presented. Second, we generalize the sequence of Fibonacci words inspired by biological concepts on DNA. The sequence of Fibonacci words is an infinite sequence of words obtained from two initial letters f(1) = a and f(2)= b, by the recursive definition f(n+2) = f(n+1)*f(n), for all positive integers n, where * denotes word concatenation. After we propose a unified terminology for different types of Fibonacci words and corresponding results in the extensive literature on the topic, we define and explore involutive Fibonacci words motivated by ideas stemming from theoretical studies of DNA computing. The relationship between different involutive Fibonacci words and their borderedness and primitivity are studied. Third, we analyze the practicability of DNA computing experiments since DNA computing and other unconventional computing methods that solve computationally challenging problems often have the limitation that the space of potential solutions grows exponentially with their sizes. For such problems, DNA computing algorithms may achieve a linear time complexity with an exponential space complexity as a trade-off. Using the subset sum problem as the benchmark problem, we present a scaling comparison of the DNA computing (DNA-C) approach with the network biocomputing (NB-C) and the electronic computing (E-C) approaches, where the volume, computing time, and energy required, relative to the input size, are compared. Our analysis shows that E-C uses a tiny volume compared to that required by DNA-C and NB-C, at the cost of the E-C computing time being outperformed first by DNA-C and then by NB-C. In addition, NB-C appears to be more energy efficient than DNA-C for some input sets, and E-C is always an order of magnitude less energy efficient than DNA-C

    Solutions to decision-making problems in management engineering using molecular computational algorithms and experimentations

    Get PDF
    制度:新 ; 報告番号:甲3368号 ; 学位の種類:博士(工学) ; 授与年月日:2011/5/23 ; 早大学位記番号:新568

    Modelos para sequenciação de padrões em problemas de corte de stock

    Get PDF
    Tese de doutoramento em Engenharia Industrial e de SistemasIn this thesis, we address an optimization problem that appears in cutting stock operations research called the minimization of the maximum number of open stacks (MOSP) and we put forward a new integer programming formulation for the MOSP. By associating the duration of each stack with an interval of time, it is possible to use the rich theory that exists in interval graphs in order to create a model based on the completion of a graph with edges. The structure of this type of graphs admits a linear ordering of the vertices that de nes an ordering of the stacks, and consequently decides a sequence for the cutting patterns. The polytope de ned by this formulation is full-dimensional and the main inequalities in the model are proved to be facets. Additional inequalities are derived based on the properties of chordal graphs and comparability graphs. The maximum number of open stacks is related with the chromatic number of the solution graph; thus the formulation is strengthened by adding the representatives formulation for the vertex coloring problem. The model is applied to the minimization of open stacks, and also to the minimum interval graph completion problem and other pattern sequencing problems such as the minimization of the order spread (MORP) and the minimization of the number of tool switches (MTSP). Computational tests of the model are presented.Nesta tese e abordado um problema de optimização que surge em operações de corte de stock chamado minimização do número máximo de pilhas abertas (MOSP) e e proposta uma nova formulação de programação inteira. Associando a duração de cada pilha a um intervalo de tempo, e possível usar a teoria rica que existe em grafos de intervalos para criar um modelo baseado no completamento de um grafo por arcos. A estrutura deste tipo de grafos admite uma ordenação linear dos vértices que define uma ordenação linear das pilhas e, por sua vez, determina a sequência dos padrões de corte. O politopo definido por esta formulação tem dimensão completa e prova-se que as principais desigualdades do modelo são facetas. São derivadas desigualdades adicionais baseadas nas propriedades de grafos cordais e de grafos de comparabilidades. O número máximo de pilhas abertas está relacionado com o número cromático do grafo solução, pelo que o modelo e reforçado com a formulação por representativos para o problema de coloração de vértices. O modelo e aplicado a minimização de pilhas abertas, e também ao problema de completamento mínimo de um grafo de intervalos e a outros problemas de sequenciação de padrões, tais como a minimização da dispersão de encomendas (MORP) e a minimização do número de trocas de ferramentas (MTSP). São apresentados testes computacionais do modelo.Fundação para a Ciência e a Tecnologia (FCT), programa de financiamento QREN-POPH-Tipologia 4.1-Formação Avançada comparticipado pelo Fundo Social Europeu e por fundos do MCTES (Bolsa individual com a refer^encia SFRH/BD/32151/2006) entre 2006 e 2009, e pela Escola Superior de Estudos Industriais e de Gest~ao do Instituto Polit ecnico do Porto (Bolsa PROTEC com a refer^encia SFRH/BD/49914/2009) entre 2009 e 2010

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    Combinatorial optimization for affinity proteomics

    Get PDF
    Biochemical test development can significantly benefit from combinatorial optimization. Multiplex assays do require complex planning decisions during implementation and subsequent validation. Due to the increasing complexity of setups and the limited resources, the need to work efficiently is a key element for the success of biochemical research and test development. The first approached problem was to systemically pool samples in order to create a multi-positive control sample. We could show that pooled samples exhibit a predictable serological profile and by using this prediction a pooled sample with the desired property. For serological assay validation it must be shown that the low, medium, and high levels can be reliably measured. It is shown how to optimally choose a few samples to achieve this requirements. Finally the latter methods were merged to validate multiplexed assays using a set of pooled samples. A novel algorithm combining fast enumeration and a set cover formulation has been introduced. The major part of the thesis deals with optimization and data analysis for Triple X Proteomics - immunoaffinity assays using antibodies binding short linear, terminal epitopes of peptides. It has been shown that the problem of choosing a minimal set of epitopes for TXP setups, which combine mass spectrometry with immunoaffinity enrichment, is equivalent to the well-known set cover problem. TXP Sandwich immunoassays capture and detect peptides by combining the C-terminal and N-terminal binders. A greedy heuristic and a meta-heuristic using local search is presented, which proves to be more efficient than pure ILP formulations. All models were implemented in the novel Java framework SCPSolver, which is applicable to many problems that can be formulated as integer programs. While the main design goal of the software was usability, it also provides a basic modelling language, easy deployment and platform independence. One question arising when analyzing TXP data was: How likely is it to observe multiple peptides sharing the same terminus? The algorithms TXP-TEA and MATERICS were able to identify binding characteristics of TXP antibodies from data obtained in immunoaffinity MS experiments, reducing the cost of such analyses. A multinomial statistical model explains the distributions of short sequences observed in protein databases. This allows deducing the average optimal length of the targeted epitope. Further a closed-from scoring function for epitope enrichment in sequence lists is derived

    Models and Algorithms for Sorting Permutations with Tandem Duplication and Random Loss

    Get PDF
    A central topic of evolutionary biology is the inference of phylogeny, i. e., the evolutionary history of species. A powerful tool for the inference of such phylogenetic relationships is the arrangement of the genes in mitochondrial genomes. The rationale is that these gene arrangements are subject to different types of mutations in the course of evolution. Hence, a high similarity in the gene arrangement between two species indicates a close evolutionary relation. Metazoan mitochondrial gene arrangements are particularly well suited for such phylogenetic studies as they are available for a wide range of species, their gene content is almost invariant, and usually free of duplicates. With these properties gene arrangements of mitochondrial genomes are modeled by permutations in which each element represents a gene, i. e., a specific genetic sequence. The mutations that shape the gene arrangement of genomes are then represented by operations that rearrange elements in permutations, so-called genome rearrangements, and thereby bridge the gap between evolutionary biology and optimization. Many problems of phylogeny inference can be formulated as challenging combinatorial optimization problems which makes this research area especially interesting for computer scientists. The most prominent examples of such optimization problems are the sorting problem and the distance problem. While the sorting problem requires a minimum length sequence of rearrangements that transforms one given permutation into another given permutation, i. e., it aims for a hypothetical scenario of gene order evolution, the distance problem intends to determine only the length of such a sequence. This minimum length is called distance and used as a (dis)similarity measure quantifying the evolutionary relatedness. Most evolutionary changes occurring in gene arrangements of mitochondrial genomes can be explained by the tandem duplication random loss (TDRL) genome rearrangement model. A TDRL consists of a duplication of a consecutive set of genes in tandem followed by a random loss of one copy of each duplicated gene. In spite of the importance of the TDRL genome rearrangement in mitochondrial evolution, its combinatorial properties have rarely been studied. In addition, models of genome rearrangements which include all types of rearrangement that are relevant for mitochondrial genomes, i. e., inversions, transpositions, inverse transpositions, and TDRLs, while admitting computational tractability are rare. Nevertheless, especially for metazoan gene arrangements the TDRL rearrangement should be considered for the reconstruction of phylogeny. Realizing that a better understanding of the TDRL model is indispensable for the study of mitochondrial gene arrangements, the central theme of this thesis is to broaden the horizon of TDRL genome rearrangements with respect to mitochondrial genome evolution. For this purpose, this thesis provides combinatorial properties of the TDRL model and its variants as well as efficient methods for a plausible reconstruction of rearrangement scenarios between gene arrangements. The methods that are proposed consider all types of genome rearrangements that predominately occur during mitochondrial evolution. More precisely, the main points contained in this thesis are as follows: The distance problem and the sorting problem for the TDRL model are further examined in respect to circular permutations, a formal concept that reflects the circular structure of mitochondrial genomes. As a result, a closed formula for the distance is provided. Recently, evidence for a variant of the TDRL rearrangement model in which the duplicated set of genes is additionally inverted have been found. Initiating the algorithmic study of this new rearrangement model on a certain type of permutations, a closed formula solving the distance problem is proposed as well as a quasilinear time algorithm that solves the corresponding sorting problem. The assumption that only one type of genome rearrangement has occurred during the evolution of certain gene arrangements is most likely unrealistic, e. g., at least three types of rearrangements on top of the TDRL rearrangement have to be considered for the evolution metazoan mitochondrial genomes. Therefore, three different biologically motivated constraints are taken into account in this thesis in order to produce plausible evolutionary rearrangement scenarios. The first constraint is extending the considered set of genome rearrangements to the model that covers all four common types of mitochondrial genome rearrangements. For this 4-type model a sharp lower bound and several close additive upper bounds on the distance are developed. As a byproduct, a polynomial-time approximation algorithm for the corresponding sorting problem is provided that guarantees the computation of pairwise rearrangement scenarios that deviate from a minimum length scenario by at most two rearrangement operations. The second biologically motivated constraint is the relative frequency of the different types of rearrangements occurring during the evolution. The frequency is modeled by employing a weighting scheme on the 4-type model in which every rearrangement is weighted with respect to its type. The resulting NP-hard sorting problem is then solved by means of a polynomial size integer linear program. The third biologically motivated constraint that has been taken into account is that certain subsets of genes are often found in close proximity in the gene arrangements of many different species. This observation is reflected by demanding rearrangement scenarios to preserve certain groups of genes which are modeled by common intervals of permutations. In order to solve the sorting problem that considers all three types of biologically motivated constraints, the exact dynamic programming algorithm CREx2 is proposed. CREx2 has a linear runtime for a large class of problem instances. Otherwise, two versions of the CREx2 are provided: The first version provides exact solutions but has an exponential runtime in the worst case and the second version provides approximated solutions efficiently. CREx2 is evaluated by an empirical study for simulated artificial and real biological mitochondrial gene arrangements

    Formal methods applied to the analysis of phylogenies: Phylogenetic model checking

    Get PDF
    Los árboles filogenéticos son abstracciones útiles para modelar y caracterizar la evolución de un conjunto de especies o poblaciones respecto del tiempo. La proposición, verificación y generalización de hipótesis sobre un árbol filogenético inferido juegan un papel importante en el estudio y comprensión de las relaciones evolutivas. Actualmente, uno de los principales objetivos científicos es extraer o descubrir los mensajes biológicos implícitos y las propiedades estructurales subyacentes en la filogenia. Por ejemplo, la integración de información genética en una filogenia ayuda al descubrimiento de genes conservados en todo o parte del árbol, la identificación de posiciones covariantes en el ADN o la estimación de las fechas de divergencia entre especies. Consecuentemente, los árboles ayudan a comprender el mecanismo que gobierna la deriva evolutiva. Hoy en día, el amplio espectro de métodos y herramientas heterogéneas para el análisis de filogenias enturbia y dificulta su utilización, además del fuerte acoplamiento entre la especificación de propiedades y los algoritmos utilizados para su evaluación (principalmente scripts ad hoc). Este problema es el punto de arranque de esta tesis, donde se analiza como solución la posibilidad de introducir un entorno formal de verificación de hipótesis que, de manera automática y modular, estudie la veracidad de dichas propiedades definidas en un lenguaje genérico e independiente (en una lógica formal asociada) sobre uno de los múltiples softwares preparados para ello. La contribución principal de la tesis es la propuesta de un marco formal para la descripción, verificación y manipulación de relaciones causales entre especies de forma independiente del código utilizado para su valoración. Para ello, exploramos las características de las técnicas de model checking, un paradigma en el que una especificación expresada en lógica temporal se verifica con respecto a un modelo del sistema que representa una implementación a un cierto nivel de detalle. Se ha aplicado satisfactoriamente en la industria para el modelado de sistemas y su verificación, emergiendo del ámbito de las ciencias de la computación. Las contribuciones concretas de la tesis han sido: A) La identificación e interpretación de los árboles filogeneticos como modelos de la evolución, adaptados al entorno de las técnicas de model checking. B) La definición de una lógica temporal que captura las propiedades filogenéticas habituales junto con un método de construcción de propiedades. C) La clasificación de propiedades filogenéticas, identificando categorías de propiedades según estén centradas en la estructura del árbol, en las secuencias o sean híbridas. D) La extensión de las lógicas y modelos para contemplar propiedades cuantitativas de tiempo, probabilidad y de distancias. E) El desarrollo de un entorno para la verificación de propiedades booleanas, cuantitativas y paramétricas. F) El establecimiento de los principios para la manipulación simbolica de objetos filogenéticos, p. ej., clados. G) La explotación de las herramientas de model checking existentes, detectando sus problemas y carencias en el campo de filogenia y proponiendo mejoras. H) El desarrollo de técnicas "ad hoc" para obtener ganancia de complejidad alrededor de dos frentes: distribución de los cálculos y datos, y el uso de sistemas de información. Los puntos A-F se centran en las aportaciones conceptuales de nuestra aproximación, mientras que los puntos G-H enfatizan la parte de herramientas e implementación. Los contenidos de la tesis están contrastados por la comunidad científica mediante las siguientes publicaciones en conferencias y revistas internacionales. La introducción de model checking como entorno formal para analizar propiedades biológicas (puntos A-C) ha llevado a la publicación de nuestro primer artículo de congreso [1]. En [2], desarrollamos la verificación de hipótesis filogenéticas sobre un árbol de ejemplo construido a partir de las relaciones impuestas por un conjunto de proteínas codificadas por el ADN mitocondrial humano (ADNmt). En ese ejemplo, usamos una herramienta automática y genérica de model checking (punto G). El artículo de revista [7] resume lo básico de los artículos de congreso previos y extiende la aplicación de lógicas temporales a propiedades filogenéticas no consideradas hasta ahora. Los artículos citados aquí engloban los contenidos presentados en las Parte I--II de la tesis. El enorme tamaño de los árboles y la considerable cantidad de información asociada a los estados (p.ej., la cadena de ADN) obligan a la introducción de adaptaciones especiales en las herramientas de model checking para mantener un rendimiento razonable en la verificación de propiedades y aliviar también el problema de la explosión de estados (puntos G-H). El artículo de congreso [3] presenta las ventajas de rebanar el ADN asociado a los estados, la partición de la filogenia en pequeños subárboles y su distribución entre varias máquinas. Además, la idea original del model checking rebanado se complementa con la inclusión de una base de datos externa para el almacenamiento de secuencias. El artículo de revista [4] reúne las nociones introducidas en [3] junto con la implementación y resultados preliminares presentados [5]. Este tema se corresponde con lo presentado en la Parte III de la tesis. Para terminar, la tesis reaprovecha las extensiones de las lógicas temporales con tiempo explícito y probabilidades a fin de manipular e interrogar al árbol sobre información cuantitativa. El artículo de congreso [6] ejemplifica la necesidad de introducir probabilidades y tiempo discreto para el análisis filogenético de un fenotipo real, en este caso, el ratio de distribución de la intolerancia a la lactosa entre diversas poblaciones arraigadas en las hojas de la filogenia. Esto se corresponde con el Capítulo 13, que queda englobado dentro de las Partes IV--V. Las Partes IV--V completan los conceptos presentados en ese artículo de conferencia hacia otros dominios de aplicación, como la puntuación de árboles, y tiempo continuo (puntos E-F). La introducción de parámetros en las hipótesis filogenéticas se plantea como trabajo futuro. Referencias [1] Roberto Blanco, Gregorio de Miguel Casado, José Ignacio Requeno, and José Manuel Colom. Temporal logics for phylogenetic analysis via model checking. In Proceedings IEEE International Workshop on Mining and Management of Biological and Health Data, pages 152-157. IEEE, 2010. [2] José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, and José Manuel Colom. Phylogenetic analysis using an SMV tool. In Miguel P. Rocha, Juan M. Corchado Rodríguez, Florentino Fdez-Riverola, and Alfonso Valencia, editors, Proceedings 5th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 93 of Advances in Intelligent and Soft Computing, pages 167-174. Springer, Berlin, 2011. [3] José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, and José Manuel Colom. Sliced model checking for phylogenetic analysis. In Miguel P. Rocha, Nicholas Luscombe, Florentino Fdez-Riverola, and Juan M. Corchado Rodríguez, editors, Proocedings 6th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 154 of Advances in Intelligent and Soft Computing, pages 95-103. Springer, Berlin, 2012. [4] José Ignacio Requeno and José Manuel Colom. Model checking software for phylogenetic trees using distribution and database methods. Journal of Integrative Bioinformatics, 10(3):229-233, 2013. [5] José Ignacio Requeno and José Manuel Colom. Speeding up phylogenetic model checking. In Mohd Saberi Mohamad, Loris Nanni, Miguel P. Rocha, and Florentino Fdez-Riverola, editors, Proceedings 7th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 222 of Advances in Intelligent Systems and Computing, pages 119-126. Springer, Berlin, 2013. [6] José Ignacio Requeno and José Manuel Colom. Timed and probabilistic model checking over phylogenetic trees. In Miguel P. Rocha et al., editors, Proceedings 8th International Conference on Practical Applications of Computational Biology and Bioinformatics, Advances in Intelligent and Soft Computing. Springer, Berlin, 2014. [7] José Ignacio Requeno, Gregorio de Miguel Casado, Roberto Blanco, and José Manuel Colom. Temporal logics for phylogenetic analysis via model checking. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(4):1058-1070, 2013

    Investigating genotype-phenotype relationships in Saccharomyces cerevisiae metabolic network through stoichiometric modeling

    Get PDF

    Graphics Processing Units: Abstract Modelling and Applications in Bioinformatics

    Get PDF
    The Graphical Processing Unit is a specialised piece of hardware that contains many low powered cores, available on both the consumer and industrial market. The original Graphical Processing Units were designed for processing high quality graphical images, for presentation to the screen, and were therefore marketed to the computer games market segment. More recently, frameworks such as CUDA and OpenCL allowed the specialised highly parallel architecture of the Graphical Processing Unit to be used for not just graphical operations, but for general computation. This is known as General Purpose Programming on Graphical Processing Units, and it has attracted interest from the scientific community, looking for ways to exploit this highly parallel environment, which was cheaper and more accessible than the traditional High Performance Computing platforms, such as the supercomputer. This interest in developing algorithms that exploit the parallel architecture of the Graphical Processing Unit has highlighted the need for scientists to be able to analyse proposed algorithms, just as happens for proposed sequential algorithms. In this thesis, we study the abstract modelling of computation on the Graphical Processing Unit, and the application of Graphical Processing Unit-based algorithms in the field of bioinformatics, the field of using computational algorithms to solve biological problems. We show that existing abstract models for analysing parallel algorithms on the Graphical Processing Unit are not able to sufficiently and accurately model all that is required. We propose a new abstract model, called the Abstract Transferring Graphical Processing Unit Model, which is able to provide analysis of Graphical Processing Unit-based algorithms that is more accurate than existing abstract models. It does this by capturing the data transfer between the Central Processing Unit and the Graphical Processing Unit. We demonstrate the accuracy and applicability of our model with several computational problems, showing that our model provides greater accuracy than the existing models, verifying these claims using experiments. We also contribute novel Graphics Processing Unit-base solutions to two bioinformatics problems: DNA sequence alignment, and Protein spectral identification, demonstrating promising levels of improvement against the sequential Central Processing Unit experiments

    Use of wavelet-packet transforms to develop an engineering model for multifractal characterization of mutation dynamics in pathological and nonpathological gene sequences

    Get PDF
    This study uses dynamical analysis to examine in a quantitative fashion the information coding mechanism in DNA sequences. This exceeds the simple dichotomy of either modeling the mechanism by comparing DNA sequence walks as Fractal Brownian Motion (fbm) processes. The 2-D mappings of the DNA sequences for this research are from Iterated Function System (IFS) (Also known as the Chaos Game Representation (CGR)) mappings of the DNA sequences. This technique converts a 1-D sequence into a 2-D representation that preserves subsequence structure and provides a visual representation. The second step of this analysis involves the application of Wavelet Packet Transforms, a recently developed technique from the field of signal processing. A multi-fractal model is built by using wavelet transforms to estimate the Hurst exponent, H. The Hurst exponent is a non-parametric measurement of the dynamism of a system. This procedure is used to evaluate gene-coding events in the DNA sequence of cystic fibrosis mutations. The H exponent is calculated for various mutation sites in this gene. The results of this study indicate the presence of anti-persistent, random walks and persistent sub-periods in the sequence. This indicates the hypothesis of a multi-fractal model of DNA information encoding warrants further consideration.;This work examines the model\u27s behavior in both pathological (mutations) and non-pathological (healthy) base pair sequences of the cystic fibrosis gene. These mutations both natural and synthetic were introduced by computer manipulation of the original base pair text files. The results show that disease severity and system information dynamics correlate. These results have implications for genetic engineering as well as in mathematical biology. They suggest that there is scope for more multi-fractal models to be developed
    corecore