5 research outputs found

    Comparing assembly strategies for third-generation sequencing technologies across different genomes

    Get PDF
    The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.This work has been partially supported by the Spanish MINECO PID2019-105396RB-I00, Junta de Andalucia JA2018 P18-FR-3433, and UMA18-FEDERJA-197 projects. Funding for open access charge: Universidad de Málaga/CBUA.Peer ReviewedPostprint (published version

    Comparing assembly strategies for third-generation sequencing technologies across different genomes

    Get PDF
    The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.Funding for open access charge: Universidad de Málaga / CBU

    LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences

    Get PDF
    Background The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. Results We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. Conclusions With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases

    LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences

    Get PDF
    Background The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. Results We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. Conclusions With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases

    Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading

    No full text
    Motivation Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence lignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. Results We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. Availability The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms
    corecore