Search CORE

49 research outputs found

Circular DNA elements of chromosomal origin are common in healthy human somatic tissue

Author: Halling Jens Frey
Hansen Anders Johannes
Lam Hugo Y. K.
Maretty Lasse
Mohiyuddin Marghoob
Møller Henrik Devitt
Pilegaard Henriette
Plomgaard Peter
Prada Luengo Inigo
Regenberg Birgitte
Sailani M. Reza
Snyder Michael P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Somatic cells can accumulate structural variations such as deletions. Here, Møller et al. show that normal human cells generate large extrachromosomal circular DNAs (eccDNAs), most likely the products of excised DNA, that can be transcriptionally active and, thus, may have phenotypic consequences

Directory of Open Access Journals

Copenhagen University Research Information System

An ensemble approach to accurately detect somatic mutations using SomaticSeq

Author: Afshar Pegah Tootoonchi
Asadi Narges Bani
Barr Sharon
Chhibber Aparna
Fan Yu
Fang Li Tai
Gerstein Mark B
Gibeling Greg
Koboldt Daniel C
Lam Hugo YK
Mohiyuddin Marghoob
Mu John C
Wang Wenyi
Wong Wing H
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0758-2) contains supplementary material, which is available to authorized users

Crossref

Digital Commons@Becker

PubMed Central

svclassify: a method to establish benchmark structural variant calls

Author: A Abyzov
A Abyzov
A Kong
AC English
AR Quinlan
B Schölkopf
C Alkan
C Lee
Desu Chen
DMJ Tax
Gabor Bartha
GR Abecasis
H Li
H Li
Hariharan Iyer
Hemang Parikh
Hugo Y. K. Lam
HYK Lam
HYK Lam
JB Burbidge
JH Ward Jr
JM Zook
JT Robinson
Justin M. Zook
K Chen
K Wong
K Ye
M Mohiyuddin
M Yousef
Marc Salit
Marghoob Mohiyuddin
Mark Pratt
MM Deza
N Cristianini
N Spies
Noah Spies
RE Mills
RM Layer
SS Khan
TF Cox
Wolfgang Losert
Publication venue: Springer Nature
Publication date: 16/01/2016
Field of study

The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.https://doi.org/10.1186/s12864-016-2366-

Crossref

PubMed Central

Digital Repository at the University of Maryland

Assessing Reproducibility of Inherited Variants Detected With Short-Read Whole Genome Sequencing

Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when \u3e 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS

Aquila Digital Community

Assessing reproducibility of inherited variants detected with short-read whole genome sequencing

Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30x. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.Peer reviewe

Aquila Digital Community

PubMed Central

Helsingin yliopiston digitaalinen arkisto

svclassify: a method to establish benchmark structural variant calls

Author: A Abyzov
A Abyzov
A Kong
AC English
AR Quinlan
B Schölkopf
C Alkan
C Lee
Desu Chen
DMJ Tax
Gabor Bartha
GR Abecasis
H Li
H Li
Hariharan Iyer
Hemang Parikh
Hugo Y. K. Lam
HYK Lam
HYK Lam
JB Burbidge
JH Ward Jr
JM Zook
JT Robinson
Justin M. Zook
K Chen
K Wong
K Ye
M Mohiyuddin
M Yousef
Marc Salit
Marghoob Mohiyuddin
Mark Pratt
MM Deza
N Cristianini
N Spies
Noah Spies
RE Mills
RM Layer
SS Khan
TF Cox
Wolfgang Losert
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

Tuning Hardware and Software for Multiprocessors

Author: Mohiyuddin Marghoob
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Technology scaling trends have enabled the exponential growth of computing power. However, the performance of communication subsystems scales less aggressively. This means that an application constrained by memory/interconnect performance will not be able to use the available computing power efficiently---in fact, technology scaling will make this efficiency even worse. This problem can be alleviated if algorithms minimize communication. To this end, we describe communication-avoiding algorithms and highly optimized implementations of a sparse linear algebra kernel called ``matrix powers''. Results show up to 2.3x improvement in performance over the naive algorithms on modern architectures. Our multi-core implementation of matrix powers enables us to develop a communication-avoiding iterative solver for sparse linear systems which is up to 2.1x faster than a conventional Generalized Minimal Residual method (GMRES) implementation. Another problem plaguing the supercomputer industry is the power bottleneck---power has, in fact, become the pre-eminent design constraint for future high-performance computing systems which is why computational efficiency is being emphasized over simply peak performance. Static benchmark codes have traditionally been used to find architectures optimal with respect to specific metrics. Unfortunately, because compilers generate sub-optimal code, benchmark performance can be a poor indicator of the performance potential of architecture design points. Therefore, we present hardware/software co-tuning as a novel approach for system design. In co-tuning, traditional architecture space exploration is tightly coupled with software auto-tuning for delivering substantial improvements in area and power efficiency. We demonstrate co-tuning by exploring the parameter space of a Tensilica's Xtensa-based multi-processor running three of the most heavily used kernels in scientific computing, each with widely varying micro-architectural requirements: sparse matrix vector multiplication, stencil-based computations, and general matrix-matrix multiplication. Resultsdemonstrate that co-tuning improves hardware area and power efficiency by up to 3x and 2.4x respectively

eScholarship - University of California

Parallel Bi-dimensional Pattern Matching with Scaling

Author: Marghoob Mohiyuddin
Phalguni Gupta
Vidit Jain
Publication venue
Publication date
Field of study

This paper deals with the problem of bi-dimensional pattern matching with scaling. The problem is to find all occurrences of the m m pattern in the N N text, scaled to all natural multiples. We have proposed an efficient parallel algorithm for this problem on CREW-PRAM with p processors. It takes O( 2 ) time

CiteSeerX