96,927 research outputs found

    Benchmarking of long-read assemblers for prokaryote whole genome sequencing.

    Get PDF
    Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms

    Formulation and Search of Assembly Sequence Design Spaces for Efficient Use of Assembly Plant Resources for New Products

    Get PDF
    Efficient procedures for generation of feasible assembly sequences and effective utilization of available assembly plant resources can greatly reduce the development time and cost of platforms for new product family members. This article presents a method to generate feasible assembly sequences and an approach to select an assembly process that reduces the existing plant modification cost. Assembly sequence design space is combinatorial in nature. Mathematical models to solve the effects of constraints on these spaces and algorithms to efficiently enumerate feasible spaces are explored in this research. Algorithms to search the feasible space to identify assembly process that can reduce the modification cost of the existing assembly plant can help increase utilization of existing resources. A software application that implements the method and algorithms has been developed. The algorithms use the concept of recursive partitioning of set of components to generate assembly sequence space. The assembly processes are then evaluated to determine the process that maximizes resource utilization for new platforms. The application of the proposed approach is demonstrated using automotive underbody front structure family.Yeshttps://us.sagepub.com/en-us/nam/manuscript-submission-guideline

    JOINING SEQUENCE ANALYSIS AND OPTIMIZATION FOR IMPROVED GEOMETRICAL QUALITY

    Get PDF
    Disturbances in the manufacturing and assembly processes cause geometrical variation from the ideal geometry. This variation eventually results in functional and aesthetic problems in the final product. Being able to control the disturbances is the desire of the manufacturing industry. \ua0 Joining sequences impact the final geometrical outcome in an assembly considerably. To optimize the sequence for improved geometrical outcome is both experimentally and computationally expensive. In the simulation-based approaches, based on the finite element method, a large number of sequences need to be evaluated.\ua0 In this thesis, the simulation-based joining sequence optimization using non-rigid variation simulation is studied. Initially, the limitation of the applied algorithms in the literature has been addressed. A rule-based optimization approach based on meta-heuristic algorithms and heuristic search methods is introduced to increase the previously applied algorithms\u27 time-efficiency and accuracy. Based on the identified rules and heuristics, a reduced formulation of the sequence optimization is introduced by identifying the critical points for geometrical quality. A subset of the sequence problem is identified and solved in this formulation.\ua0 For real-time optimization of the joining sequence problem, time-efficiency needs to be further enhanced by parallel computations. By identifying the sequence-deformation behavior in the assemblies, black-box surrogate models are introduced, enabling parallel evaluations and accurate approximation of the geometrical quality. Based on this finding, a deterministic stepwise search algorithm for rapid identification of the optimal sequence is introduced.\ua0 Furthermore, a numerical approach to identify the number, location from a set of alternatives, and sequence of the critical joining points for geometrical quality is introduced. Finally, the cause of the various deformations achieved by joining sequences is identified. A time-efficient non-rigid variation simulation approach for evaluating the geometrical quality with respect to the sequences is proposed. \ua0 The results achieved from the studies presented indicate that the simulation-based real-time optimization of the joining sequences is achievable through a parallelized search algorithm and a rapid evaluation of the sequences. The critical joining points for geometrical quality are identified while the sequence is optimized. The results help control the assembly process with respect to the joining operation, improve the geometrical quality, and save significant computational time

    SLIQ: Simple Linear Inequalities for Efficient Contig Scaffolding

    Full text link
    Scaffolding is an important subproblem in "de novo" genome assembly in which mate pair data are used to construct a linear sequence of contigs separated by gaps. Here we present SLIQ, a set of simple linear inequalities derived from the geometry of contigs on the line that can be used to predict the relative positions and orientations of contigs from individual mate pair reads and thus produce a contig digraph. The SLIQ inequalities can also filter out unreliable mate pairs and can be used as a preprocessing step for any scaffolding algorithm. We tested the SLIQ inequalities on five real data sets ranging in complexity from simple bacterial genomes to complex mammalian genomes and compared the results to the majority voting procedure used by many other scaffolding algorithms. SLIQ predicted the relative positions and orientations of the contigs with high accuracy in all cases and gave more accurate position predictions than majority voting for complex genomes, in particular the human genome. Finally, we present a simple scaffolding algorithm that produces linear scaffolds given a contig digraph. We show that our algorithm is very efficient compared to other scaffolding algorithms while maintaining high accuracy in predicting both contig positions and orientations for real data sets.Comment: 16 pages, 6 figures, 7 table

    Synteny Paths for Assembly Graphs Comparison

    Get PDF
    Despite the recent developments of long-read sequencing technologies, it is still difficult to produce complete assemblies of eukaryotic genomes in an automated fashion. Genome assembly software typically output assembled fragments (contigs) along with assembly graphs, that encode all possible layouts of these contigs. Graph representation of the assembled genome can be useful for gene discovery, haplotyping, structural variations analysis and other applications. To facilitate the development of new graph-based approaches, it is important to develop algorithms for comparison and evaluation of assembly graphs produced by different software. In this work, we introduce synteny paths: maximal paths of homologous sequence between the compared assembly graphs. We describe Asgan - an algorithm for efficient synteny paths decomposition, and use it to evaluate assembly graphs of various bacterial assemblies produced by different approaches. We then apply Asgan to discover structural variations between the assemblies of 15 Drosophila genomes, and show that synteny paths are robust to contig fragmentation. The Asgan tool is freely available at: https://github.com/epolevikov/Asgan

    Assembly and Disassembly Planning by using Fuzzy Logic & Genetic Algorithms

    Full text link
    The authors propose the implementation of hybrid Fuzzy Logic-Genetic Algorithm (FL-GA) methodology to plan the automatic assembly and disassembly sequence of products. The GA-Fuzzy Logic approach is implemented onto two levels. The first level of hybridization consists of the development of a Fuzzy controller for the parameters of an assembly or disassembly planner based on GAs. This controller acts on mutation probability and crossover rate in order to adapt their values dynamically while the algorithm runs. The second level consists of the identification of theoptimal assembly or disassembly sequence by a Fuzzy function, in order to obtain a closer control of the technological knowledge of the assembly/disassembly process. Two case studies were analyzed in order to test the efficiency of the Fuzzy-GA methodologies

    A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

    Full text link
    Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development of new classes of mapping tools and {\em de novo} assemblers. These algorithms are challenged by the continued improvement in sequencing throughput. We here describe digital normalization, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors. Digital normalization substantially reduces the size of shotgun data sets and decreases the memory and time requirements for {\em de novo} sequence assembly, all without significantly impacting content of the generated contigs. We apply digital normalization to the assembly of microbial genomic data, amplified single-cell genomic data, and transcriptomic data. Our implementation is freely available for use and modification
    • …
    corecore