12 research outputs found

    Variable neighborhood search for solving the DNA fragment assembly problem

    Get PDF
    The fragment assembly problem consists in the building of the DNA sequence from several hundreds (or even, thousands) of fragments obtained by biologists in the laboratory. This is an important task in any genome project, since the accuracy of the rest of the phases depends of the result of this stage. In addition, real instances are very large and therefore, the efficiency is also a very important issue in the design of fragment assemblers. In this paper, we propose two Variable Neighborhood Search variants for solving the DNA fragment assembly problem. These algorithms are specifically adapted for the problem being the difference between them the optimization orientation (fitness function). One of them maximizes the Parsons’s fitness function (which only considers the overlapping among the fragments) and the other estimates the variation in the number of contigs during a local search movement, in order to minimize the number of contigs. The results show that doesn’t exist a direct relation between these functions (even in several cases opposite values are generated) although for the tested instances, both variants allow to find similar and very good results but the second option reduces significatively the consumed-time.VIII Workshop de Agentes y Sistemas InteligentesRed de Universidades con Carreras en Informática (RedUNCI

    Problema de ensamblado de fragmentos de ADN resuelto mediante metaheurísticas y paralelismo

    Get PDF
    Esta tesis aborda el problema de ensamblado de fragmentos del genoma de un organismo mediante la utilización de técnicas metaheurísticas. La obtención de un ensamblado completo y de alta calidad de un genoma tiene implicaciones directas en la Biología y la Medicina. Esta tarea es particularmente compleja cuando se trabaja con genomas de gran tamaño, como es el caso de la mayoría de los eucariotas (animales, plantas y hongos). Razón por la cual, es sumamente necesario contar con algoritmos ensambladores que permitan obtener secuencias genómicas de alta calidad en tiempos razonables y, así, proseguir de manera segura y eficiente con las etapas subsiguientes del proyecto de genómica.Eje: Tesis DoctoralesRed de Universidades con Carreras en Informática (RedUNCI

    Solving the DNA fragment assembly problem with a parallel discrete firefly algorithm implemented on GPU

    Get PDF
    The Deoxyribonucleic Acid Fragment Assembly Problem (DNA-FAP) consists in reconstructing a DNA chain from a set of fragments taken randomly. This problem represents an important step in the genome project. Several authors are proposed different approaches to solve the DNA-FAP. In particular, nature-inspired algorithms have been used for its resolution. Even they were obtaining good results; its computational time associated is high. The bio-inspired algorithms are iterative search processes that can explore and exploit efficiently the solution space. Firefly Algorithm is one of the recent evolutionary computing models which is inspired by the flashing light behaviour of fireflies. Recently, the Graphics Processing Units (GPUs) technology are emerge as a novel environment for a parallel implementation and execution of bio-inspired algorithms. Therefore, the use of GPU-based parallel computing it is possible as a complementary tool to speed-up the search. In this work, we design and implement a Discrete Firefly Algorithm (DFA) on a GPU architecture in order to speed-up the search process for solving the DNA Fragment Assembly Problem. Through several experiments, the efficiency of the algorithm and the quality of the results are demonstrated with the potential to applied for longer sequences or sequences of unknown length as well.Fil: Vidal, Pablo Javier. Universidad Nacional de la Patagonia Austral. Unidad Académica Caleta Olivia. Departamento de Ciencias Exactas y Naturales; Argentina. Universidad Nacional de la Patagonia Austral. Centro de Investigaciones y Transferencia Golfo San Jorge. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro de Investigaciones y Transferencia Golfo San Jorge. Universidad Nacional de la Patagonia "San Juan Bosco". Centro de Investigaciones y Transferencia Golfo San Jorge; ArgentinaFil: Olivera, Ana Carolina. Universidad Nacional de la Patagonia Austral. Centro de Investigaciones y Transferencia Golfo San Jorge. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro de Investigaciones y Transferencia Golfo San Jorge. Universidad Nacional de la Patagonia "San Juan Bosco". Centro de Investigaciones y Transferencia Golfo San Jorge; Argentina. Universidad Nacional de la Patagonia Austral. Unidad Académica Caleta Olivia. Departamento de Ciencias Exactas y Naturales; Argentin

    Bioinformatics Support of Genome Sequencing Projects

    Get PDF
    The genome of an organism is the book of life . It encodes the complete set of genetic instructions for the development of the organism. The structure of a genome is a linear sequence of nucleotides. Determination of the sequence of a genome lays the foundation for understanding biology at the molecular level. With the current biotechnology, it is a challenging task to determine the sequence of a genome. A sequencing machine can read the sequence of a piece of DNA for up to 1000 bp (base pairs). However, genomes are very huge. For example, the genome of the bacterium E. coli is about 4 Mb (million base pairs) in size, the genome of the nematode C. elegans is 100 Mb in size, and the human genome is 3 Gb in size. The inability to produce long sequences by sequencing machines requires that long sequences be produced from short sequence reads. A shotgun sequencing strategy is widely used to determine the sequence of a long segment of DNA. In this strategy, multiple copies of the DNA segment are randomly cut into small pieces. The sequence of each piece is read by an automated sequencing machine. The sequence of the large DNA segment is reconstructed by a computer program from short sequence reads. The sequence assembly problem is to assemble short reads into long sequences. What makes the sequence assembly problem non-trivial is that there is no information about how short sequence reads are ordered with respect to the DNA segment

    Computational Biology Methods and Their Application to the Comparative Genomics of Endocellular Symbiotic Bacteria of Insects

    Get PDF
    Comparative genomics has become a real tantalizing challenge in the postgenomic era. This fact has been mostly magnified by the plethora of new genomes becoming available in a daily bases. The overwhelming list of new genomes to compare has pushed the field of bioinformatics and computational biology forward toward the design and development of methods capable of identifying patterns in a sea of swamping data noise. Despite many advances made in such endeavor, the ever-lasting annoying exceptions to the general patterns remain to pose difficulties in generalizing methods for comparative genomics. In this review, we discuss the different tools devised to undertake the challenge of comparative genomics and some of the exceptions that compromise the generality of such methods. We focus on endosymbiotic bacteria of insects because of their genomic dynamics peculiarities when compared to free-living organisms

    Variable neighborhood search for solving the DNA fragment assembly problem

    Get PDF
    The fragment assembly problem consists in the building of the DNA sequence from several hundreds (or even, thousands) of fragments obtained by biologists in the laboratory. This is an important task in any genome project, since the accuracy of the rest of the phases depends of the result of this stage. In addition, real instances are very large and therefore, the efficiency is also a very important issue in the design of fragment assemblers. In this paper, we propose two Variable Neighborhood Search variants for solving the DNA fragment assembly problem. These algorithms are specifically adapted for the problem being the difference between them the optimization orientation (fitness function). One of them maximizes the Parsons’s fitness function (which only considers the overlapping among the fragments) and the other estimates the variation in the number of contigs during a local search movement, in order to minimize the number of contigs. The results show that doesn’t exist a direct relation between these functions (even in several cases opposite values are generated) although for the tested instances, both variants allow to find similar and very good results but the second option reduces significatively the consumed-time.VIII Workshop de Agentes y Sistemas InteligentesRed de Universidades con Carreras en Informática (RedUNCI

    Computational studies with ESTs: assembly, SNP detection, and applications in alternative splicing

    Get PDF
    EST sequences are important in functional genomics studies. To better use available EST resources, clustering and assembling are crucial techniques. For EST sequences with deep coverage, no current assembly program can handle them well. We describe a deep assembly program named DA. The program keeps the number of differences in each contig alignment under control by making corrections to differences that are likely due to sequencing errors. Experimental results on the 115 clusters from the UniGene database show that DA can handle data sets of deep coverage efficiently. A comparison of the DA consensus sequences with the finished human and mouse genomes indicates that the consensus sequences are of acceptable quality;EST sequences can be used in SNP discovery. We describe a computational method for finding common SNPs with allele frequencies in single-pass sequences of deep coverage. The method enhances a widely used program named PolyBayes in several aspects. We present results from our method and PolyBayes on eighteen data sets of human expressed sequence tags (ESTs) with deep coverage. The results indicate that our method used almost all single-pass sequences in computation of the allele frequencies of SNPs;EST sequences can also be used to study alternative splicing (AS), which is the most common post transcription event in metazoans. We first developed a pipeline to identify AS forms by comparing alignments between expressed sequences and genomic sequences. Then we studied the relationship between AS and gene duplication. We observed that duplicate genes have fewer AS forms than single-copy genes; we also found that the loss of alternative splicing in duplicate genes may occur shortly after the gene duplication. Further analysis of the alternative splicing distribution in human duplicate pairs showed the asymmetric evolution of alternative splicing after gene duplications. We also compared AS among six species. We found significant differences on both AS rates and splice forms per gene among the studied species by detailed and categorized studies. The difference in AS rate between rice and Arabidopsis is significant enough to lead to a difference in protein diversity between those two species

    Anales del XIII Congreso Argentino de Ciencias de la Computación (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterogéneas Redes de Avanzada Redes inalámbricas Redes móviles Redes activas Administración y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad informática y autenticación, privacidad Infraestructura para firma digital y certificados digitales Análisis y detección de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integración (Web Services o .Net)Red de Universidades con Carreras en Informática (RedUNCI

    Anales del XIII Congreso Argentino de Ciencias de la Computación (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterogéneas Redes de Avanzada Redes inalámbricas Redes móviles Redes activas Administración y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad informática y autenticación, privacidad Infraestructura para firma digital y certificados digitales Análisis y detección de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integración (Web Services o .Net)Red de Universidades con Carreras en Informática (RedUNCI
    corecore