Search CORE

10 research outputs found

A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing

Author: A Athanasiadis
A Jarmuz
A Mehta
A Zaranek
AD Scadden
Alexander Wait Zaranek
AM Sheehy
B Ewing
B Ewing
B Mangeat
B Teng
BL Bass
BL Bass
C Esnault
D Kimelman
DD Kim
Dirk Schübeler
DL Wheeler
DR Bentley
E Eisenberg
E Tuzun
Erez Y. Levanon
EY Levanon
George M. Church
H Lellek
JB Li
JE Wedekind
JP Vartanian
KA Lehmann
KJ McKernan
L Saccomanno
LD Hillier
LP Keegan
M Blow
M Muramatsu
M Muramatsu
MA O'Connell
N Navaratnam
P Revy
Q Yu
R Mariani
RS Harris
RS Harris
S Maas
SG Conticello
SK Wong
SR Hurst
T Melcher
Tom Clegg
Tomer Zecharia
U Kim
WJ Kent
Y Neeman
YL Chiu
YL Chiu
YN Lee
Z Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Recommended from our members

Accurate Whole-Genome Sequencing and Haplotyping from 10 to 20 Human Cells

Author: Alexeev Andrei
Alferov Oleg
Baccash Jonathan
Ball Madeleine Price
Chen Linsu
Church George McDonald
Dahl Fredrik
Drmanac Radoje
Ebert Jessica C.
Haas Juergen
Halpern Aaron L.
Hong Peter
Jiang Yuan
Kennemer Michael I.
Kermani Bahram G.
Konvicka Karel
Lee Je-Hyuk
Liu Jia
Nilsen Geoffrey B.
Pant Krishna P.
Perazich Helena
Peters Brock A.
Peterson Joseph E.
Pothuraju Kaliprasad
Robasky Kimberly J.
Sparks Andrew B.
Tang Y. Tom
Tsoupko-Sitnikov Mike
Yeung George
Zaranek Alexander Wait
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/05/2013
Field of study

Recent advances in whole genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, Long Fragment Read (LFR) technology, similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ~100 pg of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants (SNVs) were assembled into long haplotype contigs. Removal of false positive SNVs not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 Mb. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications

Harvard University - DASH

Identification of Widespread Ultra-Edited Human RNAs

Author: A Athanasiadis
A Kiran
A Siepel
A Wait Zaranek
ADJ Scadden
ADJ Scadden
ADJ Scadden
ADJ Scadden
AG Polson
AR Gruber
B-E Wulff
BL Bass
BL Bass
C Chen
CE Samuel
CX George
DDY Kim
DP Morse
DR Clutterbuck
Erez Y. Levanon
EY Levanon
EY Levanon
G Lev-Maor
GI St Laurent
H Liang
Itamar Borukhov
J Jurka
J Ohlson
J-H Yang
JB Li
JB Patterson
JC Hartner
JC Hartner
JY Lee
K Nishikura
K Nishikura
KA Lehmann
KS Pollard
KV Prasanth
L-L Chen
LP Keegan
M Barak
M Blow
M Enstero
M Ensterö
M Higuchi
M Meltzer
M Sakurai
MA Batzer
MA O'Connell
MS Paul
N Paz
N Paz-Yaacov
Nancy Maizels
P Vitali
PA Fujita
Q Wang
Q Wang
R Cattaneo
R Milo
S Farajollahi
S Maas
S Osenberg
Shai Carmi
SM Rueter
T Melcher
W Yang
WM Gommans
WM Gommans
Z Zhang
Z Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Adenosine-to-inosine modification of RNA molecules (A-to-I RNA editing) is an important mechanism that increases transciptome diversity. It occurs when a genomically encoded adenosine (A) is converted to an inosine (I) by ADAR proteins. Sequencing reactions read inosine as guanosine (G); therefore, current methods to detect A-to-I editing sites align RNA sequences to their corresponding DNA regions and identify A-to-G mismatches. However, such methods perform poorly on RNAs that underwent extensive editing (“ultra”-editing), as the large number of mismatches obscures the genomic origin of these RNAs. Therefore, only a few anecdotal ultra-edited RNAs have been discovered so far. Here we introduce and apply a novel computational method to identify ultra-edited RNAs. We detected 760 ESTs containing 15,646 editing sites (more than 20 sites per EST, on average), of which 13,668 are novel. Ultra-edited RNAs exhibit the known sequence motif of ADARs and tend to localize in sense strand Alu elements. Compared to sites of mild editing, ultra-editing occurs primarily in Alu-rich regions, where potential base pairing with neighboring, inverted Alus creates particularly long double-stranded RNA structures. Ultra-editing sites are underrepresented in old Alu subfamilies, tend to be non-conserved, and avoid exons, suggesting that ultra-editing is usually deleterious. A possible biological function of ultra-editing could be mediated by non-canonical splicing and cleavage of the RNA near the editing sites

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Harvard Personal Genome Project: Lessons from Participatory Public Research

Author: Ball M.P.
Bobe J.R.
Chou M.F.
Church G.M.
Clegg T.
Estep P.W.
Lunshof J.E.
Vandewege W.
Wait Zaranek A.
Publication venue
Publication date: 01/01/2014
Field of study

Background: Since its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an 'open consent' framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment.Discussion: Our model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, self-pursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project.Summary: We find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants. © 2014 Ball et al.; licensee BioMed Central Ltd

VU Research Portal

Recommended from our members

The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes

Author: Agarwal Misha R.
Ball Madeleine P.
Barua Nina
Carnevali Paolo
Chin Robert
Church George M.
Ciotlos Serban
Clegg Tom
Connelly Abram
Drmanac Radoje
Estep Preston W.
Mao Qing
Nguyen Staci
Peters Brock A.
Vandewege Ward
Zaranek Alexander Wait
Zhang Rebecca Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. Findings: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics’ Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics’ standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. Conclusions: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function. Electronic supplementary material The online version of this article (doi:10.1186/s13742-016-0148-z) contains supplementary material, which is available to authorized users

Harvard University - DASH

Springer - Publisher Connector

A public resource facilitating clinical use of genomes

Author: Abraham M. Rosenbaum
Alberto Labarga
Alexander Wait Zaranek
Anugraha M. Raman
Athurva Gore
Brock A. Peters
Byung Chul Kim
Carlos Cano
Christine E. Seidman
Church
Daniel B. Vorhaus
Euan A. Ashley
Fitting
Geoffrey B. Nilsen
George M. Church
Heidi L. Rehm
Hougs
Hugh Y. Rienhoff
Jason Bobe
Je-Hyuk Lee
Jeantine E. Lunshof
Jeong-Sun Seo
Jin Billy Li
John Aach
Jong Bhak
Jong-Il Kim
Joseph V. Thakuria
Joyce L. Yang
Kim
Kimberly Robasky
Klein
Kun Zhang
Leonid Peshkin
Luhan Yang
Madeleine P. Ball
Matthew J. Callow
Matthew T. Wheeler
Michael F. Chou
Michael F. Murray
Misha Angrist
Pagon
Peter Hulick
Preston W. Estep
Radoje Drmanac
Seong-Jin Kim
Shawn M. Douglas
Sullivan
Tom Clegg
Ward Vandewege
Wendy K. Chung
Xiaodi Wu
Zaranek
Zhe Li
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2012
Field of study

Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review boardapproved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10).We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain - we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.close555

CiteSeerX

Maastricht University Research Portal

eScholarship - University of California

Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells

Author: A Sandelin
Aaron L. Halpern
Alexander Wait Zaranek
Andrei Alexeev
Andrew B. Sparks
Bahram G. Kermani
Brock A. Peters
D Pushkarev
DA Wheeler
DF Conrad
DG MacArthur
DR Bentley
EK Suk
ES Lander
FB Dean
Fredrik Dahl
Geoffrey B. Nilsen
George M. Church
George Yeung
H Yang
HC Fan
Helena Perazich
IA Adzhubei
J Duitama
J Wang
JC Bryne
JC Roach
JC Venter
Je-Hyuk Lee
Jessica C. Ebert
JI Kim
Jia Liu
JM Rothberg
JO Kitzman
Jonathan Baccash
Joseph E. Peterson
Juergen Haas
K Zhang
KA Frazer
Kaliprasad Pothuraju
Karel Konvicka
KE Lohmueller
Kimberly Robasky
KJ McKernan
Krishna P. Pant
L Ma
Linsu Chen
Madeleine Price Ball
Michael I. Kennemer
Mike Tsoupko-Sitnikov
Oleg Alferov
P Carnevali
Peter Hong
R Drmanac
R Tewhey
Radoje Drmanac
S Levy
SM Ahn
SR Browning
The 1000 Genomes Project Consortium
The International HapMap Consortium
TJ Ley
Y. Tom Tang
Yuan Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A highly annotated whole-genome sequence of a Korean individual

Author: ACJ Frijters
Alexander Wait Zaranek
AS Kondrashov
Callum J. Bell
Charles Lee
D Tian
DA Petrov
DA Wheeler
DJ Sugarbaker
Dongwan Hong
Dongwhan Suh
DR Bentley
ES Lander
Gary P. Schroth
George M. Church
Hansoo Park
Heewook Lee
HS Lee
Hye-Sun Kim
HyeRan Kim
Hyun Nyun Woo
Hyun-Jin Kim
Hyungtae Kim
Hyunjun Park
In-Soon Chung
J Mudge
J Wang
Jae-Hyuk Yi
JC Silva
JC Venter
Jeong Yeon Kim
Jeong-Sun Seo
Ji-Sun Lee
Ji-Young Yun
Jim J. Huntley
Joann Mudge
Jong-Il Kim
Joseph Thakuria
K Hirayasu
K Yoshiura
Kap-Seok Yang
Maryam Yavartanoo
Mi Kyeong Lee
Minhye Kwak
N Longman-Jacobsen
NA Miller
Neil A. Miller
Omer Gokcumen
R Chen
RC Hardison
RR Sokal
Ryan E. Mills
Ryan W. Kim
S Levy
Seonwook Lee
Seung-Hyun Seo
Seungbok Lee
Sheehyun Kim
Shujun Luo
Stephen F. Kingsmore
TD Wu
Thomas D. Wu
Woo-Chung Lee
Woong-Yang Park
Xiaodi Wu
Ying Zheng
Young Seok Ju
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref