Search CORE

60 research outputs found

Reevaluating Assembly Evaluations with Feature Response Curves: GAGE and Assemblathons

Author: Mishra Bud
Narzisi Giuseppe
Vezzi Francesco
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/10/2012
Field of study

In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information. Furthermore, in the absence of any metric that captures the most fundamental "features" of a high-quality assembly, there is no obvious recipe for users to select the most desirable assembler/assembly. International competitions such as Assemblathons or GAGE tried to identify the best assembler(s) and their features. Some what circuitously, the only available approach to gauge de novo assemblies and assemblers relies solely on the availability of a high-quality fully assembled reference genome sequence. Still worse, reference-guided evaluations are often both difficult to analyze, leading to conclusions that are difficult to interpret. In this paper, we circumvent many of these issues by relying upon a tool, dubbed FRCbam, which is capable of evaluating de novo assemblies from the read-layouts even when no reference exists. We extend the FRCurve approach to cases where lay-out information may have been obscured, as is true in many deBruijn-graph-based algorithms. As a by-product, FRCurve now expands its applicability to a much wider class of assemblers -- thus, identifying higher-quality members of this group, their inter-relations as well as sensitivity to carefully selected features, with or without the support of a reference sequence or layout for the reads. The paper concludes by reevaluating several recently conducted assembly competitions and the datasets that have resulted from them.Comment: Submitted to PLoS One. Supplementary material available at http://www.nada.kth.se/~vezzi/publications/supplementary.pdf and http://cs.nyu.edu/mishra/PUBLICATIONS/12.supplementaryFRC.pd

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

GAM-NGS: genomic assemblies merger for next generation sequencing

Author: Lars Arvestad
Policriti Alberto
Scalabrin Simone
Vezzi Francesco
Vicedomini Riccardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions.Results: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools.Conclusions: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct

Archivio istituzionale della ricerca - Università degli Studi di Udine

Springer - Publisher Connector

PubMed Central

Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools

Author: Andrey Alexeyenko
Benjamin Turner
Bjarne Knudsen
Björn Nystedt
Cheng-Cang Wu
Ellen Sherwood
Francesco Vezzi
Joakim Lundeberg
Martin Simonsen
Pieter de Jong
Rosa Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Fast, accurate, and lightweight analysis of BS-treated reads with ERNE 2

Author: A Chatterjee
A Policriti
A Policriti
A Policriti
A Schumacher
AD Smith
Alberto Policriti
B Langmead
B Langmead
B Pedersen
C Del Fabbro
EY Harris
F Krueger
F Mohn
F Vezzi
Francesco Vezzi
Max Käller
N Prezza
Nicola Prezza
R Li
R Lister
SJ Cokus
W Guo
Y Assenov
Y Xi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Feature-by-Feature – Evaluating De Novo Sequence Assembly

Author: A Hyvärinen
A Hyvärinen
AM Phillippy
Andrey Rzhetsky
Bud Mishra
C Boutsidis
D Zerbino
DA Earl
DC Richter
DD Sommer
ES Lander
F Menges
Francesco Vezzi
G Narzisi
G Narzisi
G Sutton
Giuseppe Narzisi
H Lu
I Imam
I Johnstone
I Jolliffe
J Bi
J Liu
J Miller
J Simpson
JR Miller
LI Nahlawi
M Prasad
N Nagarajan
R Li
R Li
S Boisvert
SL Salzberg
X Huang
Y Lin
Publication venue: Public Library of Science
Publication date: 03/02/2012
Field of study

The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in computational biology. Despite the availability of a plethora of tools (i.e., assemblers), all claiming to have solved the WGSA problem, little has been done to systematically compare their accuracy and power. Traditional methods rely on standard metrics and read simulation: while on the one hand, metrics like N50 and number of contigs focus only on size without proportionately emphasizing the information about the correctness of the assembly, comparisons performed on simulated dataset, on the other hand, can be highly biased by the non-realistic assumptions in the underlying read generator. Recently the Feature Response Curve (FRC) method was proposed to assess the overall assembly quality and correctness: FRC transparently captures the trade-offs between contigs' quality against their sizes. Nevertheless, the relationship among the different features and their relative importance remains unknown. In particular, FRC cannot account for the correlation among the different features. We analyzed the correlation among different features in order to better describe their relationships and their importance in gauging assembly quality and correctness. In particular, using multivariate techniques like principal and independent component analysis we were able to estimate the “excess-dimensionality” of the feature space. Moreover, principal component analysis allowed us to show how poorly the acclaimed N50 metric describes the assembly quality. Applying independent component analysis we identified a subset of features that better describe the assemblers performances. We demonstrated that by focusing on a reduced set of highly informative features we can use the FRC curve to better describe and compare the performances of different assemblers. Moreover, as a by-product of our analysis, we discovered how often evaluation based on simulated data, obtained with state of the art simulators, lead to not-so-realistic results

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Transcriptomics and methylomics of CD4-positive T cells in arsenic-exposed women

Author: A Cardenas
A Fragale
A Rahman
AE Jaffe
AM Deaton
AS Andrew
B Hernandez-Castro
BC Jackson
C Cambiaggi
CH Tseng
CM Schlebusch
CM Schlebusch
CY McLean
D Ferrario
D Rojas
DC Koestler
DL Boone
F Harari
F Krueger
Francesco Marabita
Francesco Vezzi
G Concha
GA Soto-Pena
IARC
Joel Gruselius
JR Pilsner
K Broberg
K Heyninck
K Li
KA Bailey
KA Ramsey
Karin Broberg
Karin Engström
KC Nadeau
KD Hansen
LH Boise
M Argos
M Lohoff
M Miyamoto
M Salgado-Bustamante
M Vahter
Marie Vahter
Max Käller
MC Wang
ML Kile
MT Zhao
Nicola Prezza
NL Dangleben
P Intarasunanont
Philip Ewels
R Biswas
R Raqib
S Ahmed
S Ahmed
S Ahmed
S Chuvpilo
SE Moore
SF Farzan
T Dorn
TK Wojdacz
Tomasz K. Wojdacz
VM Dixit
W Ding
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Author: \uc9l\ue9nie Godzaridis
Adam M. Phillippy
Alexey Sergushichev
Anton Alexandrov
Benedict Paten
Binghang Liu
Bruno M. Vieira
Carson Qu
Daniel S. Rokhsar
Dariusz Przybylski
David B. Jaffe
David C. Schwartz
David Haussler
DEL FABBRO Cristian
Delphine Naquin
Dent Earl
Dominique Lavenier
Erich D. Jarvis
Fedor Tsarev
Filipe J. Ribeiro
Fran\ue7ois Laviolette
Francisco Pina Martins
Ganeshkumar Ganapathy
Giles Hall
Guillaume Chapuis
Guojie Zhang
Hamidreza Chitsaz
Hao Zhang
Henry Song
Huaiyang Jiang
Iain Maccallum
Ian F. Korf
Inan\ue7 Birol
Isaac Y. Ho
J. Ruby
Jacob O. Kitzman
Jacques Corbeil
James R. Knight
Jared T. Simpson
Jarrod A. Chapman
Jason Howard
Jay Shendure
Jianying Yuan
Joseph B. Hiatt
Joseph N. Fass
Jun Wang
Keith R. Bradnam
Kim C. Worley
Martin Hunt
Matthew D. Macmanes
Matthias Haimel
Michael C. Schatz
Michael Bechner
Michael Place
Nicolas Maillet
Nuno A. Fonseca
Oct\ue1vio S. Paulo
Paul J. Kersey
Paul Baranay
Pavel Fedotov
Rayan Chikhi
Richard A. Gibbs
Richard Durbin
Ruibang Luo
S\ue9bastien Boisvert
Sante Gnerre
Scalabrin Simone
Scott Emrich
Sergey Kazakov
Sergey Koren
Sergey Melnikov
Shaun D. Jackman
Shiguo Zhou
Shuangye Yin
Siu Ming Yiu
Stephen Richards
Steve Goldstein
T. Docking
Tak Wah Lam
Ted Sharpe
Thomas D. Otto
Timothy I. Shaw
Vezzi Francesco
Vicedomini Riccardo
Wen Chi Chou
Xiang Qin
Yingrui Li
Yue Liu
Yujian Shi
Zemin Ning
Zhenyu Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another

Archivio istituzionale della ricerca - Università degli Studi di Udine

Next generation sequencing revolution challenges: search, assemble, and validate genomes

Author: Vezzi Francesco
Publication venue: place:Udine
Publication date: 15/03/2012
Field of study

The possibility to routinely sequence a genome has became a reality in the last years. In this thesis the most pressing problems of today bioinformatics are discussed. Moreover, new solutions are proposed in particular: a new short string aligner designed to align the myriads of sequences generated by state of the art sequencers against a reference genome, a new reference guided assembly pipeline, and a new assembly validation metho

Archivio istituzionale della ricerca - Università degli Studi di Udine

Explosive Motion Acquisition via Deep Reinforcement Learning for a Bio-Inspired Quadruped

Author: VEZZI FRANCESCO
Publication venue: 'Pisa University Press'
Publication date: 19/11/2022
Field of study

The aim of this thesis is to use Deep Reinforcement Learning for training a bio-inspired quadruped to acquire explosive motion skills that could be used for navigation in challenging natural environments. The jumping task has been taken into account, either jumping in place or jumping forward. The training method proposed combines evolutionary strategy and deep reinforcement learning algorithms in two steps. During the first one, ARS (Augmented Random Search) is used in combination with a sparse reward and deterministic behavior. In the second one, an imitation learning simplified approach is used to train a more complex neural network with PPO (Proximal Policy Optimization) that is successively retrained with a task-related reward function. In the end, the agent learns how to jump and softly land starting and ending in a default position exploiting the compliant elements (joint-level parallel springs) for better performances

Electronic Thesis and Dissertation Archive - Università di Pisa