Search CORE

17,396 research outputs found

Recommended from our members

A novel NGS library preparation method to characterize native termini of fragmented DNA.

Author: Green Richard E
Harkins Kelly M
Kapp Joshua
Naughton Colin
Rao Varsha
Schaefer Nathan K
Shapiro Beth
Troll Christopher J
Publication venue: eScholarship, University of California
Publication date: 01/05/2020
Field of study

Biological and chemical DNA fragmentation generates DNA molecules with a variety of termini, including blunt ends and single-stranded overhangs. We have developed a Next Generation Sequencing (NGS) assay, XACTLY, to interrogate the termini of fragmented DNA, information traditionally lost in standard NGS library preparation methods. Here we describe the XACTLY method, showcase its sensitivity and specificity, and demonstrate its utility in in vitro experiments. The XACTLY assay is able to report relative abundances of all lengths and types (5' and 3') of single-stranded overhangs, if present, on each DNA fragment with an overall accuracy between 80-90%. In addition, XACTLY retains the sequence of each native DNA molecule after fragmentation and can capture the genomic landscape of cleavage events at single nucleotide resolution. The XACTLY assay can be applied as a novel research and discovery tool for fragmentation analyses and in cell-free DNA

eScholarship - University of California

The complete mitochondrial genome of the foodborne parasitic pathogen Cyclospora cayetanensis

Author: B Langmead
C Olivier
CR Sterling
DA Relman
DA Relman
David Caramelli
DG Higgins
DJ Conway
DJ Conway
FA Lopez
Gopal Gopinath
Hediye Nese Cinar
Helen R. Murphy
HM McBride
JC Abbott
JL Boore
K Hikosaka
K Hikosaka
K Tamura
Karen Jarvis
MD Preston
ME Ogedengbe
MJ Arrowood
ML Eberhard
ML Eberhard
MW Gray
N Segata
NJ Pieniazek
NM Fast
RQ Lin
S Anderson
TJ Carver
YR Ortega
YR Ortega
YR Ortega
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 04/06/2015
Field of study

Cyclospora cayetanensis is a human-specific coccidian parasite responsible for several food and water-related outbreaks around the world, including the most recent ones involving over 900 persons in 2013 and 2014 outbreaks in the USA. Multicopy organellar DNA such as mitochondrion genomes have been particularly informative for detection and genetic traceback analysis in other parasites. We sequenced the C. cayetanensis genomic DNA obtained from stool samples from patients infected with Cyclospora in Nepal using the Illumina MiSeq platform. By bioinformatically filtering out the metagenomic reads of non-coccidian origin sequences and concentrating the reads by targeted alignment, we were able to obtain contigs containing Eimeria-like mitochondrial, apicoplastic and some chromosomal genomic fragments. A mitochondrial genomic sequence was assembled and confirmed by cloning and sequencing targeted PCR products amplified from Cyclospora DNA using primers based on our draft assembly sequence. The results show that the C. cayetanensis mitochondrion genome is 6274 bp in length, with 33% GC content, and likely exists in concatemeric arrays as in Eimeria mitochondrial genomes. Phylogenetic analysis of the C. cayetanensis mitochondrial genome places this organism in a tight cluster with Eimeria species. The mitochondrial genome of C. cayetanensis contains three protein coding genes, cytochrome (cytb), cytochrome C oxidase subunit 1 (cox1), and cytochrome C oxidase subunit 3 (cox3), in addition to 14 large subunit (LSU) and nine small subunit (SSU) fragmented rRNA genes

Crossref

Directory of Open Access Journals

PubMed Central

University of East Anglia digital repository

FigShare

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Reevaluating Assembly Evaluations with Feature Response Curves: GAGE and Assemblathons

Author: Mishra Bud
Narzisi Giuseppe
Vezzi Francesco
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/10/2012
Field of study

In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information. Furthermore, in the absence of any metric that captures the most fundamental "features" of a high-quality assembly, there is no obvious recipe for users to select the most desirable assembler/assembly. International competitions such as Assemblathons or GAGE tried to identify the best assembler(s) and their features. Some what circuitously, the only available approach to gauge de novo assemblies and assemblers relies solely on the availability of a high-quality fully assembled reference genome sequence. Still worse, reference-guided evaluations are often both difficult to analyze, leading to conclusions that are difficult to interpret. In this paper, we circumvent many of these issues by relying upon a tool, dubbed FRCbam, which is capable of evaluating de novo assemblies from the read-layouts even when no reference exists. We extend the FRCurve approach to cases where lay-out information may have been obscured, as is true in many deBruijn-graph-based algorithms. As a by-product, FRCurve now expands its applicability to a much wider class of assemblers -- thus, identifying higher-quality members of this group, their inter-relations as well as sensitivity to carefully selected features, with or without the support of a reference sequence or layout for the reads. The paper concludes by reevaluating several recently conducted assembly competitions and the datasets that have resulted from them.Comment: Submitted to PLoS One. Supplementary material available at http://www.nada.kth.se/~vezzi/publications/supplementary.pdf and http://cs.nyu.edu/mishra/PUBLICATIONS/12.supplementaryFRC.pd

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

Author: Chang Yu
Chi-Man Liu
David W Cheung
Edward Wu
Haoxiang Lin
Hing-Fung Ting
Jianqiao Zhu
Lap-Kei Lee
Ruibang Luo
Ruiqiang Li
Shaoliang Peng
Siu-Ming Yiu
Tak-Wah Lam
Thomas Wong
Wenjuan Zhu
Xiaoqian Zhu
Yingrui Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

FigShare

From Pine Cones to Read Clouds: Rescaffolding the Megagenome of Sugar Pine (Pinus lambertiana).

Author: Crepeau Marc W
Langley Charles H
Stevens Kristian A
Publication venue: eScholarship, University of California
Publication date: 01/05/2017
Field of study

We investigate the utility and scalability of new read cloud technologies to improve the draft genome assemblies of the colossal, and largely repetitive, genomes of conifers. Synthetic long read technologies have existed in various forms as a means of reducing complexity and resolving repeats since the outset of genome assembly. Recently, technologies that combine subhaploid pools of high molecular weight DNA with barcoding on a massive scale have brought new efficiencies to sample preparation and data generation. When combined with inexpensive light shotgun sequencing, the resulting data can be used to scaffold large genomes. The protocol is efficient enough to consider routinely for even the largest genomes. Conifers represent the largest reference genome projects executed to date. The largest of these is that of the conifer Pinus lambertiana (sugar pine), with a genome size of 31 billion bp. In this paper, we report on the molecular and computational protocols for scaffolding the P. lambertiana genome using the library technology from 10× Genomics. At 247,000 bp, the NG50 of the existing reference sequence is the highest scaffold contiguity among the currently published conifer assemblies; this new assembly's NG50 is 1.94 million bp, an eightfold increase

Directory of Open Access Journals

eScholarship - University of California