96,927 research outputs found
Benchmarking of long-read assemblers for prokaryote whole genome sequencing.
Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms
Formulation and Search of Assembly Sequence Design Spaces for Efficient Use of Assembly Plant Resources for New Products
Efficient procedures for generation of feasible assembly sequences and effective utilization of available assembly plant resources can greatly reduce the development time and cost of platforms for new product family members. This article presents a method to generate feasible assembly sequences and an approach to select an assembly process that reduces the existing plant modification cost. Assembly sequence design space is combinatorial in nature. Mathematical models to solve the effects of constraints on these spaces and algorithms to efficiently enumerate feasible spaces are explored in this research. Algorithms to search the feasible space to identify assembly process that can reduce the modification cost of the existing assembly plant can help increase utilization of existing resources. A software application that implements the method and algorithms has been developed. The algorithms use the concept of recursive partitioning of set of components to generate assembly sequence space. The assembly processes are then evaluated to determine the process that maximizes resource utilization for new platforms. The application of the proposed approach is demonstrated using automotive underbody front structure family.Yeshttps://us.sagepub.com/en-us/nam/manuscript-submission-guideline
JOINING SEQUENCE ANALYSIS AND OPTIMIZATION FOR IMPROVED GEOMETRICAL QUALITY
Disturbances in the manufacturing and assembly processes cause geometrical variation from the ideal geometry. This variation eventually results in functional and aesthetic problems in the final product. Being able to control the disturbances is the desire of the manufacturing industry. \ua0 Joining sequences impact the final geometrical outcome in an assembly considerably. To optimize the sequence for improved geometrical outcome is both experimentally and computationally expensive. In the simulation-based approaches, based on the finite element method, a large number of sequences need to be evaluated.\ua0 In this thesis, the simulation-based joining sequence optimization using non-rigid variation simulation is studied. Initially, the limitation of the applied algorithms in the literature has been addressed. A rule-based optimization approach based on meta-heuristic algorithms and heuristic search methods is introduced to increase the previously applied algorithms\u27 time-efficiency and accuracy. Based on the identified rules and heuristics, a reduced formulation of the sequence optimization is introduced by identifying the critical points for geometrical quality. A subset of the sequence problem is identified and solved in this formulation.\ua0 For real-time optimization of the joining sequence problem, time-efficiency needs to be further enhanced by parallel computations. By identifying the sequence-deformation behavior in the assemblies, black-box surrogate models are introduced, enabling parallel evaluations and accurate approximation of the geometrical quality. Based on this finding, a deterministic stepwise search algorithm for rapid identification of the optimal sequence is introduced.\ua0 Furthermore, a numerical approach to identify the number, location from a set of alternatives, and sequence of the critical joining points for geometrical quality is introduced. Finally, the cause of the various deformations achieved by joining sequences is identified. A time-efficient non-rigid variation simulation approach for evaluating the geometrical quality with respect to the sequences is proposed. \ua0 The results achieved from the studies presented indicate that the simulation-based real-time optimization of the joining sequences is achievable through a parallelized search algorithm and a rapid evaluation of the sequences. The critical joining points for geometrical quality are identified while the sequence is optimized. The results help control the assembly process with respect to the joining operation, improve the geometrical quality, and save significant computational time
SLIQ: Simple Linear Inequalities for Efficient Contig Scaffolding
Scaffolding is an important subproblem in "de novo" genome assembly in which
mate pair data are used to construct a linear sequence of contigs separated by
gaps. Here we present SLIQ, a set of simple linear inequalities derived from
the geometry of contigs on the line that can be used to predict the relative
positions and orientations of contigs from individual mate pair reads and thus
produce a contig digraph. The SLIQ inequalities can also filter out unreliable
mate pairs and can be used as a preprocessing step for any scaffolding
algorithm. We tested the SLIQ inequalities on five real data sets ranging in
complexity from simple bacterial genomes to complex mammalian genomes and
compared the results to the majority voting procedure used by many other
scaffolding algorithms. SLIQ predicted the relative positions and orientations
of the contigs with high accuracy in all cases and gave more accurate position
predictions than majority voting for complex genomes, in particular the human
genome. Finally, we present a simple scaffolding algorithm that produces linear
scaffolds given a contig digraph. We show that our algorithm is very efficient
compared to other scaffolding algorithms while maintaining high accuracy in
predicting both contig positions and orientations for real data sets.Comment: 16 pages, 6 figures, 7 table
Synteny Paths for Assembly Graphs Comparison
Despite the recent developments of long-read sequencing technologies, it is still difficult to produce complete assemblies of eukaryotic genomes in an automated fashion. Genome assembly software typically output assembled fragments (contigs) along with assembly graphs, that encode all possible layouts of these contigs. Graph representation of the assembled genome can be useful for gene discovery, haplotyping, structural variations analysis and other applications. To facilitate the development of new graph-based approaches, it is important to develop algorithms for comparison and evaluation of assembly graphs produced by different software. In this work, we introduce synteny paths: maximal paths of homologous sequence between the compared assembly graphs. We describe Asgan - an algorithm for efficient synteny paths decomposition, and use it to evaluate assembly graphs of various bacterial assemblies produced by different approaches. We then apply Asgan to discover structural variations between the assemblies of 15 Drosophila genomes, and show that synteny paths are robust to contig fragmentation. The Asgan tool is freely available at: https://github.com/epolevikov/Asgan
Assembly and Disassembly Planning by using Fuzzy Logic & Genetic Algorithms
The authors propose the implementation of hybrid Fuzzy Logic-Genetic
Algorithm (FL-GA) methodology to plan the automatic assembly and disassembly
sequence of products. The GA-Fuzzy Logic approach is implemented onto two
levels. The first level of hybridization consists of the development of a Fuzzy
controller for the parameters of an assembly or disassembly planner based on
GAs. This controller acts on mutation probability and crossover rate in order
to adapt their values dynamically while the algorithm runs. The second level
consists of the identification of theoptimal assembly or disassembly sequence
by a Fuzzy function, in order to obtain a closer control of the technological
knowledge of the assembly/disassembly process. Two case studies were analyzed
in order to test the efficiency of the Fuzzy-GA methodologies
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified
single-cell genomes, and metagenomes has enabled investigation of a wide range
of organisms and ecosystems. However, sampling variation in short-read data
sets and high sequencing error rates of modern sequencers present many new
computational challenges in data interpretation. These challenges have led to
the development of new classes of mapping tools and {\em de novo} assemblers.
These algorithms are challenged by the continued improvement in sequencing
throughput. We here describe digital normalization, a single-pass computational
algorithm that systematizes coverage in shotgun sequencing data sets, thereby
decreasing sampling variation, discarding redundant data, and removing the
majority of errors. Digital normalization substantially reduces the size of
shotgun data sets and decreases the memory and time requirements for {\em de
novo} sequence assembly, all without significantly impacting content of the
generated contigs. We apply digital normalization to the assembly of microbial
genomic data, amplified single-cell genomic data, and transcriptomic data. Our
implementation is freely available for use and modification
- …