180 research outputs found

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    New constructive heuristics for DNA sequencing by hybridization

    Get PDF
    Deoxyribonucleic acid (DNA) is a molecule that consists of two complementary sequences of amino acids. Reading these sequences is an important task in biology, called DNA sequencing. However, large DNA molecules cannot be read in one piece. Therefore, existing techniques first break the given DNA molecules up into small fragments which can be read. One of these techniques is called the hybridization experiment. The reconstruction of the original DNA molecule from these fragments is a challenging problem from the computational point of view. In recent years the specific problem of DNA sequencing by hybridization has attracted quite a lot of interest in the optimization community. While most researchers focused on the development of metaheuristic approaches, work on simple constructive heuristics hardly received any attention. This is despite the fact that well-working constructive heuristics are often an essential component of succesful metaheuristics. It is exactly this lack of constructive heuristics that motivated the work presented in this paper. The results of our best constructive heuristic are comparable to the results of the best existing metaheuristics, while using less computational time.Postprint (published version

    New algorithms for DNA sequencing by hybridization

    Get PDF
    The reconstruction of DNA sequences from DNA fragments is one of the most challenging problems in computational biology. In recent years the specific problem of DNA sequencing by hybridization has attracted quite a lot of interest in the optimization community. Despite the fact that well-working constructive heuristics are often the basis for well-working metaheuristics, only two constructive heuristics exist. Both approaches were proposed by Blazewicz and colleagues; the first one is a look-ahead greedy technique, and the second one is a constructive technique based on constructing reliable sub-sequences. Our motivation was twofold. First, we wanted to develop better constructive heuristics. Second, on the basis of these heuristics we wanted to develop new state-of-the-art metaheuristics for DNA sequencing by hybridization. In the first part of the paper we present our constructive heuristics. We show that the results of the best constructive heuristic are comparable to the results of existing metaheuristics, while using less computational time. In the second part of the paper we propose an ant colony optimization (ACO) approach and apply it in a so-called multi-level framework. Both, the ACO algorithm and the multi-level framework are based on our constructive heuristics. The computational results show that our algorithm is currently a state-of-the-art algorithm for DNA sequencing by hybridization.Postprint (published version

    Visual Search of Neuropil-Enriched RNAs from Brain In Situ Hybridization Data through the Image Analysis Pipeline Hippo-ATESC

    Get PDF
    International audienceMotivation: RNA molecules specifically enriched in the neuropil of neuronal cells and in particular in dendritic spines are of great interest for neurobiology in virtue of their involvement in synaptic structure and plasticity. The systematic recognition of such molecules is therefore a very important task. High resolution images of RNA in situ hybridization experiments contained in the Allen Brain Atlas (ABA) represent a very rich resource to identify them and have been so far exploited for this task through human-expert analysis. However, software tools that may automatically address the same objective are not very well developed. Results: In this study we describe an automatic method for exploring in situ hybridization data and discover neuropil-enriched RNAs in the mouse hippocampus. We called it Hippo-ATESC (Automatic Texture Extraction from the Hippocampal region using Soft Computing). Bioinformatic validation showed that the Hippo-ATESC is very efficient in the recognition of RNAs which are manually identified by expert curators as neuropil-enriched on the same image series. Moreover, we show that our method can also highlight genes revealed by microdissection-based methods but missed by human visual inspection. We experimentally validated our approach by identifying a non-coding transcript enriched in mouse synaptosomes. The code is freely available on the web at http://ibislab.ce.unipr.it/software/hippo/

    Hyper‐Heuristics and Metaheuristics for Selected Bio‐Inspired Combinatorial Optimization Problems

    Get PDF
    Many decision and optimization problems arising in bioinformatics field are time demanding, and several algorithms are designed to solve these problems or to improve their current best solution approach. Modeling and implementing a new heuristic algorithm may be time‐consuming but has strong motivations: on the one hand, even a small improvement of the new solution may be worth the long time spent on the construction of a new method; on the other hand, there are problems for which good‐enough solutions are acceptable which could be achieved at a much lower computational cost. In the first case, specially designed heuristics or metaheuristics are needed, while the latter hyper‐heuristics can be proposed. The paper will describe both approaches in different domain problems

    From metaheuristics to learnheuristics: Applications to logistics, finance, and computing

    Get PDF
    Un gran nombre de processos de presa de decisions en sectors estratègics com el transport i la producció representen problemes NP-difícils. Sovint, aquests processos es caracteritzen per alts nivells d'incertesa i dinamisme. Les metaheurístiques són mètodes populars per a resoldre problemes d'optimització difícils en temps de càlcul raonables. No obstant això, sovint assumeixen que els inputs, les funcions objectiu, i les restriccions són deterministes i conegudes. Aquests constitueixen supòsits forts que obliguen a treballar amb problemes simplificats. Com a conseqüència, les solucions poden conduir a resultats pobres. Les simheurístiques integren la simulació a les metaheurístiques per resoldre problemes estocàstics d'una manera natural. Anàlogament, les learnheurístiques combinen l'estadística amb les metaheurístiques per fer front a problemes en entorns dinàmics, en què els inputs poden dependre de l'estructura de la solució. En aquest context, les principals contribucions d'aquesta tesi són: el disseny de les learnheurístiques, una classificació dels treballs que combinen l'estadística / l'aprenentatge automàtic i les metaheurístiques, i diverses aplicacions en transport, producció, finances i computació.Un gran número de procesos de toma de decisiones en sectores estratégicos como el transporte y la producción representan problemas NP-difíciles. Frecuentemente, estos problemas se caracterizan por altos niveles de incertidumbre y dinamismo. Las metaheurísticas son métodos populares para resolver problemas difíciles de optimización de manera rápida. Sin embargo, suelen asumir que los inputs, las funciones objetivo y las restricciones son deterministas y se conocen de antemano. Estas fuertes suposiciones conducen a trabajar con problemas simplificados. Como consecuencia, las soluciones obtenidas pueden tener un pobre rendimiento. Las simheurísticas integran simulación en metaheurísticas para resolver problemas estocásticos de una manera natural. De manera similar, las learnheurísticas combinan aprendizaje estadístico y metaheurísticas para abordar problemas en entornos dinámicos, donde los inputs pueden depender de la estructura de la solución. En este contexto, las principales aportaciones de esta tesis son: el diseño de las learnheurísticas, una clasificación de trabajos que combinan estadística / aprendizaje automático y metaheurísticas, y varias aplicaciones en transporte, producción, finanzas y computación.A large number of decision-making processes in strategic sectors such as transport and production involve NP-hard problems, which are frequently characterized by high levels of uncertainty and dynamism. Metaheuristics have become the predominant method for solving challenging optimization problems in reasonable computing times. However, they frequently assume that inputs, objective functions and constraints are deterministic and known in advance. These strong assumptions lead to work on oversimplified problems, and the solutions may demonstrate poor performance when implemented. Simheuristics, in turn, integrate simulation into metaheuristics as a way to naturally solve stochastic problems, and, in a similar fashion, learnheuristics combine statistical learning and metaheuristics to tackle problems in dynamic environments, where inputs may depend on the structure of the solution. The main contributions of this thesis include (i) a design for learnheuristics; (ii) a classification of works that hybridize statistical and machine learning and metaheuristics; and (iii) several applications for the fields of transport, production, finance and computing

    Exact and non-exact procedures for solving the response time variability problem (RTVP)

    Get PDF
    Premi extraordinari doctorat curs 2009-2010, àmbit d’Enginyeria IndustrialCuando se ha de compartir un recurso entre demandas (de productos, clientes, tareas, etc.) competitivas que requieren una atención regular, es importante programar el derecho al acceso del recurso de alguna forma justa de manera que cada producto, cliente o tarea reciba un acceso al recurso proporcional a su demanda relativa al total de las demandas competitivas. Este tipo de problemas de secuenciación pueden ser generalizados bajo el siguiente esquema. Dados n símbolos, cada uno con demanda di (i = 1,...,n), se ha de generar una secuencia justa o regular donde cada símbolo aparezca di veces. No existe una definición universal de justicia, ya que puede haber varias métricas razonables para medirla según el problema específico considerado. En el Problema de Variabilidad en el Tiempo de Respuesta, o Response Time Variability Problem (RTVP) en inglés, la injusticia o irregularidad de una secuencia es medida como la suma, para todos los símbolos, de sus variabilidades en las distancias en que las copias de cada símbolo son secuenciados. Así, el objetivo del RTVP es encontrar la secuencia que minimice la variabilidad total. En otras palabras, el objetivo del RTVP es minimizar la variabilidad de los instantes en que los productos, clientes o trabajos reciben el recurso necesario. Este problema aparece en una amplia variedad de situaciones de la vida real; entre otras, secuenciación en líneas de modelo-mixto bajo just-in-time (JIT), en asignación de recursos en sistemas computacionales multi-hilo como sistemas operativos, servidores de red y aplicaciones mutimedia, en el mantenimiento periódico de maquinaria, en la recolección de basura, en la programación de comerciales en televisión y en el diseño de rutas para agentes comerciales con múltiples visitas a un mismo cliente. En algunos de estos problemas la regularidad no es una propiedad deseable por sí misma, si no que ayuda a minimizar costes. De hecho, cuando los costes son proporcionales al cuadrado de las distancias, el problema de minimizar costes y el RTVP son equivalentes. El RTVP es muy difícil de resolver (se ha demostrado que es NP-hard). El tamaño de las instancias del RTVP que pueden ser resueltas óptimamente con el mejor método exacto existente en la literatura tiene un límite práctico de 40 unidades. Por otro lado, los métodos no exactos propuestos en la literatura para resolver instancias mayores consisten en heurísticos simples que obtienen soluciones rápidamente, pero cuya calidad puede ser mejorada. Por tanto, los métodos de resolución existentes en la literatura son insuficientes. El principal objetivo de esta tesis es mejorar la resolución del RTVP. Este objetivo se divide en los dos siguientes subobjetivos : 1) aumentar el tamaño de las instancias del RTVP que puedan ser resueltas de forma óptima en un tiempo de computación práctico, y 2) obtener de forma eficiente soluciones lo más cercanas a las óptimas para instancias mayores. Además, la tesis tiene los dos siguientes objetivos secundarios: a) investigar el uso de metaheurísticos bajo el esquema de los hiper-heurísticos, y b) diseñar un procedimiento sistemático y automático para fijar los valores adecuados a los parámetros de los algoritmos. Se han desarrollado diversos métodos para alcanzar los objetivos anteriormente descritos. Para la resolución del RTVP se ha diseñado un método exacto basado en la técnica branch and bound y el tamaño de las instancias que pueden resolverse en un tiempo práctico se ha incrementado a 55 unidades. Para instancias mayores, se han diseñado métodos heurísticos, metaheurísticos e hiper-heurísticos, los cuales pueden obtener soluciones óptimas o casi óptimas rápidamente. Además, se ha propuesto un procedimiento sistemático y automático para tunear parámetros que aprovecha las ventajas de dos procedimientos existentes (el algoritmo Nelder & Mead y CALIBRA).When a resource must be shared between competing demands (of products, clients, jobs, etc.) that require regular attention, it is important to schedule the access right to the resource in some fair manner so that each product, client or job receives a share of the resource that is proportional to its demand relative to the total of the competing demands. These types of sequencing problems can be generalized under the following scheme. Given n symbols, each one with demand di (i = 1,...,n), a fair or regular sequence must be built in which each symbol appears di times. There is not a universal definition of fairness, as several reasonable metrics to measure it can be defined according to the specific considered problem. In the Response Time Variability Problem (RTVP), the unfairness or the irregularity of a sequence is measured by the sum, for all symbols, of their variabilities in the positions at which the copies of each symbol are sequenced. Thus, the objective of the RTVP is to find the sequence that minimises the total variability. In other words, the RTVP objective is to minimise the variability in the instants at which products, clients or jobs receive the necessary resource. This problem appears in a broad range of real-world areas. Applications include sequencing of mixed-model assembly lines under just-in-time (JIT), resource allocation in computer multi-threaded systems such as operating systems, network servers and media-based applications, periodic machine maintenance, waste collection, scheduling commercial videotapes for television and designing of salespeople's routes with multiple visits, among others. In some of these problems the regularity is not a property desirable by itself, but it helps to minimise costs. In fact, when the costs are proportional to the square of the distances, the problem of minimising costs and the RTVP are equivalent. The RTVP is very hard to be solved (it has been demonstrated that it is NP-hard). The size of the RTVP instances that can be solved optimally with the best exact method existing in the literature has a practical limit of 40 units. On the other hand, the non-exact methods proposed in the literature to solve larger instances are simple heuristics that obtains solutions quickly, but the quality of the obtained solutions can be improved. Thus, the solution methods existing in the literature are not enough to solve the RTVP. The main objective of this thesis is to improve the resolution of the RTVP. This objective is split in the two following sub-objectives: 1) to increase the size of the RTVP instances that can be solved optimally in a practical computing time; and 2) to obtain efficiently near-optimal solutions for larger instances. Moreover, the thesis has the following two secondary objectives: a) to research the use of metaheuristics under the scheme of hyper-heuristics, and b) to design a systematic, hands-off procedure to set the suitable values of the algorithm parameters. To achieve the aforementioned objectives, several procedures have been developed. To solve the RTVP an exact procedure based on the branch and bound technique has been designed and the size of the instances that can be solved in a practical time has been increased to 55 units. For larger instances, heuristic, heuristic, metaheuristic and hyper-heuristic procedures have been designed, which can obtain optimal or near-optimal solutions quickly. Moreover, a systematic, hands-off fine-tuning method that takes advantage of the two existing ones (Nelder & Mead algorithm and CALIBRA) has been proposed.Award-winningPostprint (published version
    corecore