Search CORE

5 research outputs found

Comparing assembly strategies for third-generation sequencing technologies across different genomes

Author: Bautista Moreno Rocío
Espinosa García Elena
Fernández Vega Ivan
Larrosa Jiménez Rafael
Lopez Zapata Emilio
Plata González Oscar Guillermo
Publication venue: Elsevier
Publication date: 01/09/2023
Field of study

The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.This work has been partially supported by the Spanish MINECO PID2019-105396RB-I00, Junta de Andalucia JA2018 P18-FR-3433, and UMA18-FEDERJA-197 projects. Funding for open access charge: Universidad de Málaga/CBUA.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Comparing assembly strategies for third-generation sequencing technologies across different genomes

Author: Bautista-Moreno Rocío
Espinosa García Elena María
Fernández Ivan
Larrosa-Jiménez Rafael
López-Zapata Emilio
Plata-González Óscar Guillermo
Publication venue: Elsevier
Publication date: 01/09/2023
Field of study

The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.Funding for open access charge: Universidad de Málaga / CBU

Repositorio Institucional Universidad de Málaga

LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences

Author: Ficarra Elisa
Reinert Knut
Urgese Gianvito
Winkler Jörg
Publication venue
Publication date: 01/01/2022
Field of study

Background The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. Results We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. Conclusions With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases

Institutional Repository of the Freie Universität Berlin

LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences

Author: Ficarra Elisa
Reinert Knut
Urgese Gianvito
Winkler Jörg
Publication venue
Publication date: 01/01/2022
Field of study

Institutional Repository of the Freie Universität Berlin

Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading

Author: Budach Stefan
Costanza Pascal
Ehrhardt Marcel
Hancox Jonny
Rahn René
Reinert Knut
Publication venue: 'Oxford University Press (OUP)'
Publication date: 03/05/2018
Field of study

Motivation Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence lignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. Results We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. Availability The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms

Crossref

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)