Search CORE

Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data

Author: A Abd-Alla
A Sundquist
AMM Abd-Alla
Andrew G Parker
B Raphael
D Zhi
E Elahi
F Mashayekhi
F Sanger
JM Prober
M Chaisson
M Margulies
M Pop
MJ Chaisson
MT Tammi
MT Tammi
MT Tammi
N Whiteford
Nicolas J Parker
P Ng
PA Pevzner
RL Warren
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, <it>Glossina pallidipes</it>, we found the need for tools to search quickly a set of reads for near exact text matches. Methods A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of <it>de novo </it>assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads. Results Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension. Conclusion The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.</p

Efficient error correction for next-generation sequencing of viral amplicons

Author: A Gilles
Alex Zelikovsky
B Georgescu
C Quince
D Comaniciu
D Comaniciu
D Eckels
David S Campo
F Lopez-Labrador
G Wang
Gilberto Vaughan
H Wang
Jonny Yokosawa
Joseph C Forbi
L Salmela
L Van Doorn
Livia Rossi
M Alter
M Chaisson
M Chaisson
M Chaisson
M Isaguliants
M Larkin
Mathworks
N Pavio
O Zagordi
P Pevzner
P Simmonds
Pavel Skums
Q Choo
S Ramachandran
X Zhao
Yury Khudyakov
Zoya Dimitrova
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: <url>http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm</url></p

ScholarWorks @ Georgia State University

Resuscitation Endpoints in Trauma

Author: Bishop MH
Burris D
Capone AC
Chaisson N. F.
Chang MC
Chang MC
Dieterich HJ.
Dutton R. P.
Greaves I
Heckbert SR
Kim SH
Kincaid EH
Knudson MM
Lang K
Matsuoka T
McKinley BA
Porter JM
Revell M
Rhee P
Schulman AM
Turner J
Wade CE
Publication venue: 'Wiley'
Publication date: 01/03/2005
Field of study

Fluid and blood resuscitation is the mainstay of therapy for the treatment of hemorrhagic shock, whether due to trauma or other etiology. Cessation of hemorrhage with rapid hemostatic techniques is the first priority in the treatment of traumatic hemorrhagic shock, with concomitant fluid resuscitation with blood and crystalloids to maintain perfusion and organ function. “Hypotensive” or “low-volume” resuscitation has become increasingly accepted in the prehospital resuscitation phase of trauma, prior to definitive hemorrhage control, since aggressive fluid resuscitation may increase bleeding. Resuscitation after hemorrhage control is focused on restoration of tissue oxygenation. Efforts to optimize resuscitation have used “resuscitation endpoints” as markers of adequacy of resuscitation. The resuscitation endpoints that have been evaluated include both global (restoration of blood pressure, heart rate and urine output, lactate, base deficit, mixed venous oxygen saturation, ventricular end-diastolic volume) and regional (gastric tonometry, near-infrared spectroscopy for measurement of muscle tissue oxygen saturation) measures. This review critically evaluates the evidence regarding the use of resuscitation endpoints in trauma.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/75386/1/j.1778-428X.2005.tb00127.x.pd

Deep Blue Documents at the University of Michigan

Sequence assembly using next generation sequencing data—challenges and solutions

Author: D Hernandez
DR Kelley
DR Zerbino
EA Rodland
EW Myers
F Sanger
Francis Y. L. Chin
H Leung
HCM Leung
Henry C. M. Leung
J Butler
JC Dohm
JT Simpson
K Salikhov
M Burrows
MJ Chaisson
MJ Chaisson
N Vyahhi
R Li
RL Warren
RW Holley
RW Holley
S. M. Yiu
W Fiers
W Min Jou
WR Jeck
Y Peng
Y Peng
Y Peng
Y Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Public Library of Science (PLOS)

The Long March: A Sample Preparation Technique that Enhances Contig Length and Coverage by High-Throughput Short-Read Sequencing

Author: A Janulaitis
AF Siegel
Armin Hekele
Charles Chiu
CJ Stoeckert Jr
CT Wai
Dale Webster
ER Mardis
F Mashayekhi
F Mathieu-Daude
F Sanger
H Okamoto
H Wakaguri
J Shendure
J. Graham Ruby
JO Korbel
Joseph L. DeRisi
Katherine Sorber
KE Wommack
M Chaisson
M Hafner
M Petrusyte
M Pop
M Ronaghi
Mark A. Batzer
Michelle Dimon
MJ Gardner
N Whiteford
O Salas-Solano
R Knight
RA Holt
RJ Roberts
RL Warren
SF Altschul
SF Altschul
SM Hadi
TS Seo
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

High-throughput short-read technologies have revolutionized DNA sequencing by drastically reducing the cost per base of sequencing information. Despite producing gigabases of sequence per run, these technologies still present obstacles in resequencing and de novo assembly applications due to biased or insufficient target sequence coverage. We present here a simple sample preparation method termed the “long march” that increases both contig lengths and target sequence coverage using high-throughput short-read technologies. By incorporating a Type IIS restriction enzyme recognition motif into the sequencing primer adapter, successive rounds of restriction enzyme cleavage and adapter ligation produce a set of nested sub-libraries from the initial amplicon library. Sequence reads from these sub-libraries are offset from each other with enough overlap to aid assembly and contig extension. We demonstrate the utility of the long march in resequencing of the Plasmodium falciparum transcriptome, where the number of genomic bases covered was increased by 39%, as well as in metagenomic analysis of a serum sample from a patient with hepatitis B virus (HBV)-related acute liver failure, where the number of HBV bases covered was increased by 42%. We also offer a theoretical optimization of the long march for de novo sequence assembly

CiteSeerX

eScholarship - University of California

Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results

Author: Andrey Rzhetsky
CS Keith
D Hernandez
David N. Kuhn
DM Church
DR Zerbino
F Sanger
G Narzisi
Isidore Rigoutsos
J Shendure
JA Reinhardt
JC Dohm
JR Miller
JR Miller
JT Simpson
K Mavromatis
Laxmi Parida
MJ Chaisson
ML Metzker
Niina Haiminen
R Blakesley
R Cronn
R Li
R Li
S Altschul
S DiGuistini
S Gnerre
S Gnerre
S Ossowski
S Rounsley
SL Salzberg
W Zhang
WR Jeck
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness

CiteSeerX

Public Library of Science (PLOS)

Identification of polymorphic inversions from genotypes

Author: A Hoffmann
A Navarro
Alejandro Cáceres
Benjamin J Raphael
D Schaid
F Antonacci
G Gimelli
H Starke
H Stefansson
J Korbel
J Martin
JM Kidd
JR Gonzalez
Juan R González
K Tantisira
L Deng
L Feuk
L Osborne
M Chaisson
M Gilling
M Kirkpatrick
Mario Cáceres
N Bosch
P Andolfatto
P Scheet
PF O'Reilly
R Durbin
R Gentleman
S Giglio
S Levy
SS Sindi
Suzanne S Sindi
V Bansal
W Kennington
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background: Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies. Results: We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data [1], utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS). Conclusions: We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model [2]. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals [3,4]. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

Author: A Bairoch
A Christoffels
A Gurevich
A Kozomara
A McKenna
A Mitchell
A Morgulis
A Morgulis
A Pradhan
A Reiner
A Rodriguez-Mari
A Stamatakis
A Yates
AI Makunin
AJ Enright
AL Price
AL Price
Alan Christoffels
Aleksey Komissarov
Alexey Tupikin
Amy Hin Yan Tong
Andrey A. Yurchenko
AR Quinlan
B Langmead
B Star
C Berthelot
C Camacho
C Holt
C Wang
Chen-Shan Chin
CS Chin
D Brawand
D Ellinghaus
DA Benson
Darrell Green
DC Hardie
Dean R. Jerry
DH Alexander
Doreen Lau
DR Kelley
DRS-K C. Jerry
E Casacuberta
E. TG Staristina
EW Myers
F Abascal
F Chen
F Yang
FC Jones
FJ Krsticevic
Fritz J. Sedlazeck
G Abrusan
G Benson
G Lin
G Marcais
G Parra
G Parra
G Tamazian
GH Yue
GH Yue
Gopikrishna Gopalapillai
Gregory W. Vurture
GS Slater
GT Valente
H Li
H Saiga
Heiner Kuhl
HH Kazazian Jr.
I Braasch
Inna S. Kuznetsova
IS Kuznetsova
J Castresana
J Eid
J Huerta-Cepas
J Jurka
J Lin
James P. Drake
JG Ruby
JN Volff
JN Volff
Jolly M. Saju
Jonas Korlach
JS Chew
Junhui Jiang
K Howe
K Katoh
K Prufer
Kathiresan Purushothaman
KD Pruitt
KJ Hoff
KP Koepfli
KW Tzung
Lawrence S. Hon
László Orbán
M Blanchette
M Kanehisa
M Kasahara
M Kolmogorov
M Krzywinski
M Martin
M Schartl
M Tarailoâ-Graovac
M Tine
MA Larkin
Mario Jonas
Marsel Kabilov
Matthew Boitano
MB Stocks
MG Grabherr
Michael C. Schatz
MJ Chaisson
MR Friedlander
N Siegel
Natascha M. Thevasagayam
NM Thevasagayam
O Jaillon
O Otero
P Cingolani
P Ravi
P Schattner
P Shannon
P Xu
Paul M. Richardson
PE Warburton
Peter Van Heusden
R Kajitani
R Lorenz
R Luo
R Moore
R Pethiyagoda
R Poulter
R She
R Sreenivasan
Ramkumar Lachumanan
RD Ward
RD Ward
Richard Hall
RJ Roberts
S Chen
S Guindon
S Hoegg
S Hoegg
S Koren
S Vij
S Zhou
Sai Rama Sridatta Prakki
Sarah Mwangi
SF Altschul
Shubha Vij
Si Lok
Si Yan Ngoh
Siddharth Singh
Simon Moxon
SM Kielbasa
Sridhar Sivasubbu
Stanley Kimbung Mbandi
Stephen J. O'Brien
Stephen W. Turner
T Anantharaman
Tamás Dalmay
Tansyn H. Noble
TD Wu
TF DeLuca
TH O'Hare
TLO Davis
TS Anantharaman
Tyler Garvin
U Consortium
U Grimholt
V Douard
V Ravi
Vinaya Kumar Katneni
Vinod Scaria
Vladimir Trifonov
W Xue
WC Liew
Woei Chang Liew
WS Davidson
X Huang
X Zheng
XG Wang
XG Wang
Xueyan Shen
Y Guiguen
Y Han
Y Hashiguchi
Y Moriya
Y Sato
Y Sato
Y Sato
Z Lai
Ø Hammer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

Public Library of Science (PLOS)

ResearchOnline@JCU

Cold Spring Harbor Laboratory Institutional Repository

ResearchOnline at James Cook University

Repository of the Academy's Library

Research Repository

University of East Anglia digital repository

NSU Works

MPG.PuRe

Assessing the effects of multiple infections and long latency in the dynamics of tuberculosis

Author: Allos B Mishu
BH Singer
C Castillo-Chavez
CR Braden
D Duffy
DJ Bradley
E Nardell
E Vynnycky
ES Hershfield
F Chaves
GL Mandell
GT Strickland
HM Yang
Hyun M Yang
IC Shamputa
JA Romeyn
JD Murray
L Esteva
L Sompayrac
M Lipsitch
M Martcheva
M Scheffer
MBF Leite
N Bacaër
PF Barnes
PW Uys
RA Kumar
RE Chaisson
RM Anderson
RM May
S Wild
Silvia M Raimundo
SM Raimundo
SM Raimundo
T Lindenstr∅m
Z Feng
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

In order to achieve a better understanding of multiple infections and long latency in the dynamics of Mycobacterium tuberculosis infection, we analyze a simple model. Since backward bifurcation is well documented in the literature with respect to the model we are considering, our aim is to illustrate this behavior in terms of the range of variations of the model's parameters. We show that backward bifurcation disappears (and forward bifurcation occurs) if: (a) the latent period is shortened below a critical value; and (b) the rates of super-infection and re-infection are decreased. This result shows that among immunosuppressed individuals, super-infection and/or changes in the latent period could act to facilitate the onset of tuberculosis. When we decrease the incubation period below the critical value, we obtain the curve of the incidence of tuberculosis following forward bifurcation; however, this curve envelops that obtained from the backward bifurcation diagram

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas