Search CORE

16 research outputs found

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

Author: AE Urban
D Pinkel
DA Wheeler
DR Bentley
DR Zerbino
F Sanger
GH Perry
J Butler
J Rozowsky
JC Dohm
JC Venter
Jiang Du
JO Korbel
JO Korbel
JY Hehir-Kwa
M Margulies
M Pop
M Pop
Mark B. Gerstein
Michael Snyder
MJ Chaisson
PA Pevzner
R Lippert
R Redon
R Schmid
RL Warren
Robert D. Bjornson
RR Selzer
S Batzoglou
S Levy
SMD Goldberg
V Bansal
William Stafford Noble
Yong Kong
Zhengdong D. Zhang
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity

Author: A Dereeper
A Ginolhac
A Hornung
A Starcevic
AS Eustaquio
AS Eustáquio
B Shen
BO Bachmann
BS Moore
C Hertweck
C Rausch
C Rausch
CN Shulse
CP Ridley
D Tillett
DD Baker
DJ Edwards
DJ Newman
DW Udwary
E Cundliffe
E Gontang
EA Gontang
Eric Allen
G Yadav
GL Challis
H Ikeda
H Jenke-Kodama
H Jenke-Kodama
H Jenke-Kodama
H Jenke-Kodama
H Jenke-Kodama
J Davies
J Piel
JA Eisen
JA Eisen
JAV Blodgett
JB McAlpine
JD McPherson
JD Thompson
JM Winter
Jonathan H. Badger
JW Li
K Penn
KC Freel
Kevin Penn
KJ Weissman
KU Foerstner
L Du
M Margulies
M Metsa-Ketela
M Nett
MA Fischbach
MC Moffitt
MH Medema
MZ Ansari
N Roongsawang
Nadine Ziemert
Paul R. Jensen
PR Jensen
R Finking
RC Edgar
RD Finn
S Guindon
S Lautru
S Lautru
SA Sieber
SC Wenzel
SD Bentley
SF Altschul
SG Tringe
Sheila Podell
SJ Moss
SMD Goldberg
T Junier
T Nguyen
Valerie de Crécy-Lagard
WP Maddison
Z Chang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The Francis Crick Institute

Genome Sequence and Transcriptome Analysis of the Radioresistant Bacterium Deinococcus gobiensis: Insights into the Extreme Environmental Adaptations

Author: A Anderson
A de Groot
A Fernández de Henestrosa
A Henne
A Iguchi
A Krisko
BA Patel
CA Norais
CE Bagwell
Chao Teng
CL Stallings
CM Sharma
D Huson
D Slade
ERM Tillier
H Li
H Schmidt
Haiying Yu
J Courcelle
J Pan
JD Thompson
Jin Wang
JK Fredrickson
JM Buis
John R. Battista
JR Battista
K Makino
K Warren Rhodes
KS Makarova
KS Makarova
LS Waters
M Tanaka
M Yuan
MA Allen
Menglong Yuan
Min Lin
Ming Chen
Mingkun Yang
MJ Daly
MJ Filiatrault
MM Cox
MS Dillingham
MS Lipton
MS Osburne
N Ivanova
NP Khairnar
O White
O Zhaxybayeva
OJ Marshall
P Castiglioni
P Mackiewicz
Peng Zhao
QF Jiang
R Gupta
R Pukall
Ran Tang
RP Sinha
S Kurtz
S Rangarajan
S Sugiman-Marangos
SF Altschul
Shuzhen Ping
SMD Goldberg
T Carver
T Ferenci
Wei Lu
Wei Zhang
Xinna Li
XY Qiu
Yanhua Hao
Yingdian Wang
Yongliang Yan
YQ Liu
Yuhua Zhan
Z Zhou
Zhengfu Zhou
ZT Sun
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The desert is an excellent model for studying evolution under extreme environments. We present here the complete genome and ultraviolet (UV) radiation-induced transcriptome of Deinococcus gobiensis I-0, which was isolated from the cold Gobi desert and shows higher tolerance to gamma radiation and UV light than all other known microorganisms. Nearly half of the genes in the genome encode proteins of unknown function, suggesting that the extreme resistance phenotype may be attributed to unknown genes and pathways. D. gobiensis also contains a surprisingly large number of horizontally acquired genes and predicted mobile elements of different classes, which is indicative of adaptation to extreme environments through genomic plasticity. High-resolution RNA-Seq transcriptome analyses indicated that 30 regulatory proteins, including several well-known regulators and uncharacterized protein kinases, and 13 noncoding RNAs were induced immediately after UV irradiation. Particularly interesting is the UV irradiation induction of the phrB and recB genes involved in photoreactivation and recombinational repair, respectively. These proteins likely include key players in the immediate global transcriptional response to UV irradiation. Our results help to explain the exceptional ability of D. gobiensis to withstand environmental extremes of the Gobi desert, and highlight the metabolic features of this organism that have biotechnological potential

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Proceedings of the 13th International Newborn Brain Conference: Neuro-imaging studies

Author: Abramsky R
Acosta Izquierdo L
Acosta R
Albeshri B
Almouqdad M
Asfour S
Asfour Y
Austin T
Bach A
Barkovich J
Beare R
Ben Fadel N
Benner E
Berger A
Blanco B
Boomsma M
Bora S
Boswinkel V
Chao A
Chin T
Collins-Jones L
Cooper R
Dagur G
Davila J
de Vries L
Dovjak G
Dubois L
Edwards A
El-Dib M
Elshibiny H
Eshel D
Eshel R
Ferriero D
Gano D
Girvan O
Glass H
Goeral K
Golan A
Goldberg RN
Gregory S
Gurvitz M
Inder T
Jain V
Jamjoom D
Kadom N
Kasprian G
Khalil T
Klebermass-Schrehof K
Kleinmahon J
Krüse-Ruijter M
Lambing H
Lee S
Leemans A
Leijser L
Lemyre B
Li Y
Maltais-Bilodeau C
Marks K
Matak P
McCulloch C
Milla S
Miller E
Mishra A
Mitsakakis N
Mohammad K
Munster C
Nijboer J
Nijboer-Oosterveld J
Nijholt I
Novoa R
Ortinau C
Pegram K
Porter E
Prayer D
Reddy D
Redpath S
Rogers E
Schmidbauer V
Scott J
Sewell E
Shany E
Shelef I
Shesrao L
Singh E
Slump C
Steele T
Szakmar E
Tax C
Thiim K
Thompson JW
Tollenaer SMD
Uchitel J
van Osch J
van Wezel-Meijler G
Verschuur A
Wu-Smit MN
Yang E
Younge N
Zein H
Publication venue: 'IOS Press'
Publication date: 01/01/2022
Field of study

UCL Discovery

BAC-pool sequencing and analysis of large segments of A12 and D12 homoeologous chromosomes in upland cotton.

Author: A Blenda
AA Salamov
AH Paterson
AJ Reinisch
AJ Robinson
AL Delcher
B Hendrix
B Roe
BA Roe
Bruce A. Roe
CT Brown
G Blanc
G Wiley
Govind C. Sharma
Graham B. Wiley
H Matsumura
J Rong
J Rong
JA Udall
JC Venter
JM Lacape
John Z. Yu
JP Tomkins
K Wang
KA Frazer
M Febrer
M Krzywinski
M Margulies
M Zhang
O Kohany
P Green
R Buyyarapu
R Gregory
Ramesh Buyyarapu
Ramesh V. Kantety
Richard G. Percy
Russell J. Kohel
S Götz
S Nautiyal
S Oh
SF Altschul
Simone Macmil
SMD Goldberg
T Wicker
T Wilkins
TBT Bureau
WS Zachary
Z Han
Z Xu
Zhanyou Xu
Zhi Wei
ZJ Chen
ZW Shappley
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/10/2013
Field of study

Acknowledgments “Dedicated to Dr. Ramesh Kantety, a mentor, colleague and friend”. We would like to acknowledge the support offered by Padmini Sripathi during data analysis and submissions. Author Contributions Conceived and designed the experiments: RVK JZY. Performed the experiments: RB ZX SM GBW. Analyzed the data: RB. Contributed reagents/materials/analysis tools: RVK RB JZY RJK BAR. Wrote the manuscript: RB. Revised the manuscript: RB RVK JZY RGP BAR GCS. Advised the research: RVK JZY RGP BAR GCS.Author Contributions Conceived and designed the experiments: RVK JZY. Performed the experiments: RB ZX SM GBW. Analyzed the data: RB. Contributed reagents/materials/analysis tools: RVK RB JZY RJK BAR. Wrote the manuscript: RB. Revised the manuscript: RB RVK JZY RGP BAR GCS. Advised the research: RVK JZY RGP BAR GCS.Although new and emerging next-generation sequencing (NGS) technologies have reduced sequencing costs significantly, much work remains to implement them for de novo sequencing of complex and highly repetitive genomes such as the tetraploid genome of Upland cotton (Gossypium hirsutum L.). Herein we report the results from implementing a novel, hybrid Sanger/454-based BAC-pool sequencing strategy using minimum tiling path (MTP) BACs from Ctg-3301 and Ctg-465, two large genomic segments in A12 and D12 homoeologous chromosomes (Ctg). To enable generation of longer contig sequences in assembly, we implemented a hybrid assembly method to process ~35x data from 454 technology and 2.8-3x data from Sanger method. Hybrid assemblies offered higher sequence coverage and better sequence assemblies. Homology studies revealed the presence of retrotransposon regions like Copia and Gypsy elements in these contigs and also helped in identifying new genomic SSRs. Unigenes were anchored to the sequences in Ctg-3301 and Ctg-465 to support the physical map. Gene density, gene structure and protein sequence information derived from protein prediction programs were used to obtain the functional annotation of these genes. Comparative analysis of both contigs with Arabidopsis genome exhibited synteny and microcollinearity with a conserved gene order in both genomes. This study provides insight about use of MTP-based BAC-pool sequencing approach for sequencing complex polyploid genomes with limited constraints in generating better sequence assemblies to build reference scaffold sequences. Combining the utilities of MTP-based BAC-pool sequencing with current longer and short read NGS technologies in multiplexed format would provide a new direction to cost-effectively and precisely sequence complex plant genomes.Yeshttp://www.plosone.org/static/editorial#pee

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

SHAREOK Repository

The Francis Crick Institute