Search CORE

27,742 research outputs found

Are we there yet? : reliably estimating the completeness of plant genome sequences

Author: Ruttink Tom
Vandepoele Klaas
Veeckman Elisabeth
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2016
Field of study

Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation

Ghent University Academic Bibliography

PubMed Central

REAPR: a universal tool for genome assembly evaluation.

Author: Berriman Matthew
Hunt Martin
Kikuchi Taisei
Newbold Chris
Otto Thomas D
Sanders Mandy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/

Crossref

Springer - Publisher Connector

PubMed Central

Enlighten

Linear-Time Superbubble Identification Algorithm for Genome Assembly

Author: Brankovic Ljiljana
Iliopoulos Costas S.
Kundu Ritu
Mohamed Manal
Pissis Solon P.
Vayani Fatima
Publication venue
Publication date: 17/09/2015
Field of study

DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce masses of data in the form of short sequences known as reads. Assembling these reads into a whole genome constitutes a major algorithmic challenge. Most assembly algorithms utilize de Bruijn graphs constructed from reads for this purpose. A critical step of these algorithms is to detect typical motif structures in the graph caused by sequencing errors and genome repeats, and filter them out; one such complex subgraph class is a so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to detect all superbubbles in a directed acyclic graph with n nodes and m (directed) edges, improving the best-known O(m log m)-time algorithm by Sung et al

arXiv.org e-Print Archive

University of Newcastle's Digital Repository

Elsevier - Publisher Connector

Crossref

King's Research Portal

Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

Author: A Bairoch
A Christoffels
A Gurevich
A Kozomara
A McKenna
A Mitchell
A Morgulis
A Morgulis
A Pradhan
A Reiner
A Rodriguez-Mari
A Stamatakis
A Yates
AI Makunin
AJ Enright
AL Price
AL Price
Alan Christoffels
Aleksey Komissarov
Alexey Tupikin
Amy Hin Yan Tong
Andrey A. Yurchenko
AR Quinlan
B Langmead
B Star
C Berthelot
C Camacho
C Holt
C Wang
Chen-Shan Chin
CS Chin
D Brawand
D Ellinghaus
DA Benson
Darrell Green
DC Hardie
Dean R. Jerry
DH Alexander
Doreen Lau
DR Kelley
DRS-K C. Jerry
E Casacuberta
E. TG Staristina
EW Myers
F Abascal
F Chen
F Yang
FC Jones
FJ Krsticevic
Fritz J. Sedlazeck
G Abrusan
G Benson
G Lin
G Marcais
G Parra
G Parra
G Tamazian
GH Yue
GH Yue
Gopikrishna Gopalapillai
Gregory W. Vurture
GS Slater
GT Valente
H Li
H Saiga
Heiner Kuhl
HH Kazazian Jr.
I Braasch
Inna S. Kuznetsova
IS Kuznetsova
J Castresana
J Eid
J Huerta-Cepas
J Jurka
J Lin
James P. Drake
JG Ruby
JN Volff
JN Volff
Jolly M. Saju
Jonas Korlach
JS Chew
Junhui Jiang
K Howe
K Katoh
K Prufer
Kathiresan Purushothaman
KD Pruitt
KJ Hoff
KP Koepfli
KW Tzung
Lawrence S. Hon
László Orbán
M Blanchette
M Kanehisa
M Kasahara
M Kolmogorov
M Krzywinski
M Martin
M Schartl
M Tarailoâ-Graovac
M Tine
MA Larkin
Mario Jonas
Marsel Kabilov
Matthew Boitano
MB Stocks
MG Grabherr
Michael C. Schatz
MJ Chaisson
MR Friedlander
N Siegel
Natascha M. Thevasagayam
NM Thevasagayam
O Jaillon
O Otero
P Cingolani
P Ravi
P Schattner
P Shannon
P Xu
Paul M. Richardson
PE Warburton
Peter Van Heusden
R Kajitani
R Lorenz
R Luo
R Moore
R Pethiyagoda
R Poulter
R She
R Sreenivasan
Ramkumar Lachumanan
RD Ward
RD Ward
Richard Hall
RJ Roberts
S Chen
S Guindon
S Hoegg
S Hoegg
S Koren
S Vij
S Zhou
Sai Rama Sridatta Prakki
Sarah Mwangi
SF Altschul
Shubha Vij
Si Lok
Si Yan Ngoh
Siddharth Singh
Simon Moxon
SM Kielbasa
Sridhar Sivasubbu
Stanley Kimbung Mbandi
Stephen J. O'Brien
Stephen W. Turner
T Anantharaman
Tamás Dalmay
Tansyn H. Noble
TD Wu
TF DeLuca
TH O'Hare
TLO Davis
TS Anantharaman
Tyler Garvin
U Consortium
U Grimholt
V Douard
V Ravi
Vinaya Kumar Katneni
Vinod Scaria
Vladimir Trifonov
W Xue
WC Liew
Woei Chang Liew
WS Davidson
X Huang
X Zheng
XG Wang
XG Wang
Xueyan Shen
Y Guiguen
Y Han
Y Hashiguchi
Y Moriya
Y Sato
Y Sato
Y Sato
Z Lai
Ø Hammer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Research Repository

Repository of the Academy's Library

University of East Anglia digital repository

NSU Works

MPG.PuRe