Search CORE

19,439 research outputs found

BigWig and BigBed: enabling browsing of large distributed datasets

Author: A. S. Hinrichs
A. S. Zweig
Alekseyenko
D. Karolchik
G. Barber
Guttman
Kent
Kent
Li
Rhead
W. J. Kent
Publication venue: Oxford University Press
Publication date
Field of study

Summary: BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets

Crossref

PubMed Central

A resource-frugal probabilistic dictionary and applications in (meta)genomics

Author: Bittner Lucie
Limasset Antoine
Marchet Camille
Peterlongo Pierre
Publication venue
Publication date: 26/05/2016
Field of study

Genomic and metagenomic fields, generating huge sets of short genomic sequences, brought their own share of high performance problems. To extract relevant pieces of information from the huge data sets generated by current sequencing techniques, one must rely on extremely scalable methods and solutions. Indexing billions of objects is a task considered too expensive while being a fundamental need in this field. In this paper we propose a straightforward indexing structure that scales to billions of element and we propose two direct applications in genomics and metagenomics. We show that our proposal solves problem instances for which no other known solution scales-up. We believe that many tools and applications could benefit from either the fundamental data structure we provide or from the applications developed from this structure.Comment: Submitted to PSC 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space

Author: Robertson David L.
Tapinos Avraam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Sequencing technologies allow for an in-depth analysis of biological species but the size of the generated datasets introduce a number of analytical challenges. Recently, we demonstrated the application of numerical sequence representations and data transformations for the alignment of short reads to a reference genome. Here, we expand out approach for de novo assembly of short reads. Our results demonstrate that highly compressed data can encapsulate the signal suffi- ciently to accurately assemble reads to big contigs or complete genomes

Crossref

Enlighten

Recommended from our members

novoBreak: local assembly for breakpoint detection in cancer genomes.

Author: Boutros Paul
Chen Junjie
Chen Ken
Chen Tenghui
Chong Zechen
Ding Li
Fan Xian
Gao Min
Lee Anna Y
Ruan Jue
Zhou Wanding
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

We present novoBreak, a genome-wide local assembly algorithm that discovers somatic and germline structural variation breakpoints in whole-genome sequencing data. novoBreak consistently outperformed existing algorithms on real cancer genome data and on synthetic tumors in the ICGC-TCGA DREAM 8.5 Somatic Mutation Calling Challenge primarily because it more effectively utilized reads spanning breakpoints. novoBreak also demonstrated great sensitivity in identifying short insertions and deletions

eScholarship - University of California

A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

Author: Friedel Caroline C.
Lindner Robert
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2012
Field of study

Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

Open Access LMU

PubMed Central

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

Author: Chang Yu
Chi-Man Liu
David W Cheung
Edward Wu
Haoxiang Lin
Hing-Fung Ting
Jianqiao Zhu
Lap-Kei Lee
Ruibang Luo
Ruiqiang Li
Shaoliang Peng
Siu-Ming Yiu
Tak-Wah Lam
Thomas Wong
Wenjuan Zhu
Xiaoqian Zhu
Yingrui Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

FigShare

Recommended from our members

A novel NGS library preparation method to characterize native termini of fragmented DNA.

Author: Green Richard E
Harkins Kelly M
Kapp Joshua
Naughton Colin
Rao Varsha
Schaefer Nathan K
Shapiro Beth
Troll Christopher J
Publication venue: eScholarship, University of California
Publication date: 01/05/2020
Field of study

Biological and chemical DNA fragmentation generates DNA molecules with a variety of termini, including blunt ends and single-stranded overhangs. We have developed a Next Generation Sequencing (NGS) assay, XACTLY, to interrogate the termini of fragmented DNA, information traditionally lost in standard NGS library preparation methods. Here we describe the XACTLY method, showcase its sensitivity and specificity, and demonstrate its utility in in vitro experiments. The XACTLY assay is able to report relative abundances of all lengths and types (5' and 3') of single-stranded overhangs, if present, on each DNA fragment with an overall accuracy between 80-90%. In addition, XACTLY retains the sequence of each native DNA molecule after fragmentation and can capture the genomic landscape of cleavage events at single nucleotide resolution. The XACTLY assay can be applied as a novel research and discovery tool for fragmentation analyses and in cell-free DNA

eScholarship - University of California