Search CORE

88 research outputs found

Cloud-Assisted Read Alignment and Privacy

Author: A O’Driscoll
DR Nyholt
E Vayena
ES Dove
IS Chan
J Kaye
LD Stein
M Akgün
M Gymrek
M Naveed
N Homer
SF Altschul
Y Erlich
Publication venue
Publication date: 01/01/2017
Field of study

Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to increase the throughput of the classical DNA workflow, e.g. by relying on the cloud to perform CPU intensive operations. However, the scientific community raised an alarm due to the possible privacy-related attacks that can be executed on genomic data. In this paper we review the state of the art in cloud-based alignment algorithms that have been developed for performance. We then present several privacy-preserving mechanisms that have been, or could be, used to align reads at an incremental performance cost. We finally argue for the use of risk analysis throughout the DNA workflow, to strike a balance between performance and protection of data

Crossref

Open Repository and Bibliography - Luxembourg

Routes for breaching and protecting genetic privacy

Author: A Acquisti
A Cavoukian
A Kong
A Machanavajjhala
A Narayanan
AD Johnson
AJ Pakstis
AK Manning
AL McGuire
Arvind Narayanan
B Fons
B Malin
B Malin
BA Malin
BM Henn
C Dwork
C Shannon
CD Huff
D Clayton
D He
D Zubakov
DJ Solve
DR Nyholt
DW Craig
EA Zerhouni
EE Schadt
EM Ramos
F Liu
G Church
H Lango Allen
H Li
HK Im
HS Venter
J Burn
J Gitschier
J Kaiser
J Kaye
J Kaye
J Lee
J Marchini
JE Lunshof
JH Park
JM Oliver
JP Roberts
K Benitez
K El Emam
K El Emam
K Silventoinen
KA Tryka
KB Jacobs
KS Kendler
L Kamm
L Sweeney
L Sweeney
LA Sweeney
LA Sweeney
LAP Kohn
LL Rodriguez
M Canim
M Gymrek
M Gymrek
M Kantarcioglu
M Kayser
MD Mailman
N Chatterjee
N Homer
NN Taleb
P Bohannon
P Kwok
P Ohm
P Paillier
PM Visscher
R Braun
R Drmanac
R Khan
R Noumeir
RL Bennett
S Byers
S McClure
S Sankararaman
S Walsh
SE Brenner
SF Terry
SH Friend
T Lumley
TE King
TE King
V Bafna
W Fu
W Hartzog
WG Hill
WW Lowrance
XL Ou
Yaniv Erlich
Z Lin
Publication venue
Publication date: 01/12/2013
Field of study

We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

PubMed Central

Addressing challenges in the production and analysis of illumina sequencing data

Author: A McKenna
A Meyerhans
AW Briggs
B Langmead
C Trapnell
CJ Creighton
D Reich
DJ Lahr
DR Bentley
DR Zerbino
EH Turner
ER Mardis
GJ Porreca
H Li
H Li
HA Burbano
J Krause
J Rougemont
Janet Kelso
KD Hansen
L Mamanova
M Fedurco
M Kircher
M Kircher
M Meyer
MA Quail
Martin Kircher
MJ Chaisson
ML Metzker
MM DeAngelis
N Whiteford
Patricia Heyn
PC Dolan
R Li
R Li
RE Green
RE Green
RM Durbin
S Hoffmann
S Paabo
SC Schuster
SJ Odelberg
T Lassmann
WC Kao
WJ Ansorge
Y Erlich
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq) represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with

Crossref

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Deep Sequencing of the Nicastrin Gene in Pooled DNA, the Identification of Genetic Variants That Affect Risk of Alzheimer's Disease

Author: A Confaloni
A Orlacchio
AA Out
AW Butler
B Dermaut
B Wang
Belinda M. Martin
Bruno Vellas
D Harold
DC Koboldt
Denise Harold
DR Dries
DW Craig
E Cousin
E Levy-Lahad
E Sidransky
EI Rogaev
G McKhann
G Yu
Gillian Hamilton
H Li
Hilkka Soininen
IJ Deary
Iwona Kloszewska
J Mitsui
JC Lambert
John F. Powell
Kathryn Lord
L Zhong
Magda Tsolaki
Makrina Danillidou
MD Abramoff
Megan Pritchard
MF Folstein
Michelle K. Lupton
P Proitsi
Patrizia Mecocci
Paul Hollingworth
Petroula Proitsi
R Cronn
Richard Wroe
Roland Roberts
S Helisalmi
S Lovestone
S Nejentsev
S Prabhu
S Shah
S Sunyaev
Simon Lovestone
T Wang
TE Druley
TE Druley
V Bansal
Y Erlich
Z Ma
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Nicastrin is an obligatory component of the γ-secretase; the enzyme complex that leads to the production of Aβ fragments critically central to the pathogenesis of Alzheimer's disease (AD). Analyses of the effects of common variation in this gene on risk for late onset AD have been inconclusive. We investigated the effect of rare variation in the coding regions of the Nicastrin gene in a cohort of AD patients and matched controls using an innovative pooling approach and next generation sequencing. Five SNPs were identified and validated by individual genotyping from 311 cases and 360 controls. Association analysis identified a non-synonymous rare SNP (N417Y) with a statistically higher frequency in cases compared to controls in the Greek population (OR 3.994, CI 1.105–14.439, p = 0.035). This finding warrants further investigation in a larger cohort and adds weight to the hypothesis that rare variation explains some of genetic heritability still to be identified in Alzheimer's disease

Public Library of Science (PLOS)

Crossref

Online Research @ Cardiff

Directory of Open Access Journals

PubMed Central

UCL Discovery

Edinburgh Research Explorer

King's Research Portal

ResearchOnline@GCU

Medulloblastoma Exome Sequencing Uncovers Subtype-Specific Somatic Mutations

Author: A Garbelli
Aaron McKenna
Alex H. Ramos
Amanda G. Kautzman
Andrey Sivachenko
BG Wilson
Brian Sogoloff
Carsten Russ
Daniel A. Pomeranz Krummel
Daniel Auclair
David T. W. Jones
DR Bentley
DW Parsons
EF Pettersen
Erica Shefler
Furong Yu
G Getz
Gad Getz
Gerald R. Crabtree
H Li
Heidi Greulich
J Oberoi
J Ren
J Zhang
James Bochicchio
James Meldrim
JE Ming
Jessica Pierre Francois
Jill P. Mesirov
JT Robinson
Kristian Cibulskis
L Wang
M Högbom
M Kool
M Remke
Matthew Meyerson
Mauricio O. Carneiro
MDetal Taylor
Michael D. Taylor
Michael G. Ross
Michael S. Lawrence
N Stransky
Natalia Teider
Natalie Jäger
Niall J. Lennon
NR Smoll
PA Futreal
Pablo Tamayo
Paul A. Northcott
Petar Stojanov
Peter Lichter
R Satow
Rachel L. Erlich
Scott L. Carter
Scott L. Pomeroy
SH Baek
Shyamal Dilhan Weeraratne
Soma Sengupta
Stacey B. Gabriel
Stefan M. Pfister
T Sengoku
Tenley C. Archer
Thomas M. Roberts
TR Peterson
Trevor J. Pugh
V Grossmann
Vladimir Amani
Y-J Cho
Yoon-Jae Cho
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2012
Field of study

Medulloblastomas are the most common malignant brain tumors in children1. Identifying and understanding the genetic events that drive these tumors is critical for the development of more effective diagnostic, prognostic and therapeutic strategies. Recently, our group and others described distinct molecular subtypes of medulloblastoma based on transcriptional and copy number profiles2–5. Here, we utilized whole exome hybrid capture and deep sequencing to identify somatic mutations across the coding regions of 92 primary medulloblastoma/normal pairs. Overall, medulloblastomas exhibit low mutation rates consistent with other pediatric tumors, with a median of 0.35 non-silent mutations per megabase. We identified twelve genes mutated at statistically significant frequencies, including previously known mutated genes in medulloblastoma such as CTNNB1, PTCH1, MLL2, SMARCA4 and TP53. Recurrent somatic mutations were identified in an RNA helicase gene, DDX3X, often concurrent with CTNNB1 mutations, and in the nuclear co-repressor (N-CoR) complex genes GPS2, BCOR, and LDB1, novel findings in medulloblastoma. We show that mutant DDX3X potentiates transactivation of a TCF promoter and enhances cell viability in combination with mutant but not wild type beta-catenin. Together, our study reveals the alteration of Wnt, Hedgehog, histone methyltransferase and now N-CoR pathways across medulloblastomas and within specific subtypes of this disease, and nominates the RNA helicase DDX3X as a component of pathogenic beta-catenin signaling in medulloblastoma

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California

Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

Author: A Ritz
AG Clark
AJ Iafrate
AR Quinlan
AR Quinlan
AV Zimin
AW Pang
Bo Thomsen
Bujie Zhan
C Alkan
C Spillane
C Xie
CA Albers
CA Heid
CG Elsik
Christian Bendixen
D Pushkarev
DA Wheeler
DF Conrad
DG Lemay
DJ de Koning
DM Larkin
DR Bentley
E Seroussi
EM Ibeagha-Awemu
ER Mardis
F Zhang
Frank Panitz
G Dennis Jr
G Lunter
GE Liu
GE Liu
GM Church
GP Consortium
GP Harhay
GT McVean
GT McVean
H Li
H Li
H Li
H Li
H Park
HB Fraser
J Eid
J Fadista
J Fadista
J Sebat
J Wang
Jakob Hedegaard
JC Dohm
JI Kim
João Fadista
JR Lupski
JS Bae
JW Drake
K Chen
K Wang
K Wong
K Ye
KJ McKernan
KU Mir
LA Hindorff
LK Matukumalli
LW Hillier
M Kirin
M Perez-Enciso
MA Taub
ME Goddard
ML Metzker
MW Nachman
O Harismendy
P Medvedev
P Stankiewicz
P Tong
PC Ng
PC Ng
R Kawahara-Miki
R Nielsen
R Redon
RA Cartwright
RA Gibbs
RE Mills
RL Tellam
S Levy
S Yoon
SC Schuster
SH Eck
SM Ahn
T Meuwissen
TH Meuwissen
V Ramensky
V Whan
V Yuzbasiyan-Gurkan
Y Erlich
Y Hou
Y Li
YS Ju
YS Ju
ZL Hu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.We thank the many people who were generous with contributing their samples to the project: the African Caribbean in Barbados; Bengali in Bangladesh; British in England and Scotland; Chinese Dai in Xishuangbanna, China; Colombians in Medellin, Colombia; Esan in Nigeria; Finnish in Finland; Gambian in Western Division – Mandinka; Gujarati Indians in Houston, Texas, USA; Han Chinese in Beijing, China; Iberian populations in Spain; Indian Telugu in the UK; Japanese in Tokyo, Japan; Kinh in Ho Chi Minh City, Vietnam; Luhya in Webuye, Kenya; Mende in Sierra Leone; people with African ancestry in the southwest USA; people with Mexican ancestry in Los Angeles, California, USA; Peruvians in Lima, Peru; Puerto Ricans in Puerto Rico; Punjabi in Lahore, Pakistan; southern Han Chinese; Sri Lankan Tamil in the UK; Toscani in Italia; Utah residents (CEPH) with northern and western European ancestry; and Yoruba in Ibadan, Nigeria. Many thanks to the people who contributed to this project: P. Maul, T. Maul, and C. Foster; Z. Chong, X. Fan, W. Zhou, and T. Chen; N. Sengamalay, S. Ott, L. Sadzewicz, J. Liu, and L. Tallon; L. Merson; O. Folarin, D. Asogun, O. Ikpwonmosa, E. Philomena, G. Akpede, S. Okhobgenin, and O. Omoniwa; the staff of the Institute of Lassa Fever Research and Control (ILFRC), Irrua Specialist Teaching Hospital, Irrua, Edo State, Nigeria; A. Schlattl and T. Zichner; S. Lewis, E. Appelbaum, and L. Fulton; A. Yurovsky and I. Padioleau; N. Kaelin and F. Laplace; E. Drury and H. Arbery; A. Naranjo, M. Victoria Parra, and C. Duque; S. Däkel, B. Lenz, and S. Schrinner; S. Bumpstead; and C. Fletcher-Hoppe. Funding for this work was from the Wellcome Trust Core Award 090532/Z/09/Z and Senior Investigator Award 095552/Z/11/Z (P.D.), and grants WT098051 (R.D.), WT095908 and WT109497 (P.F.), WT086084/Z/08/Z and WT100956/Z/13/Z (G.M.), WT097307 (W.K.), WT0855322/Z/08/Z (R.L.), WT090770/Z/09/Z (D.K.), the Wellcome Trust Major Overseas program in Vietnam grant 089276/Z.09/Z (S.D.), the Medical Research Council UK grant G0801823 (J.L.M.), the UK Biotechnology and Biological Sciences Research Council grants BB/I02593X/1 (G.M.) and BB/I021213/1 (A.R.L.), the British Heart Foundation (C.A.A.), the Monument Trust (J.H.), the European Molecular Biology Laboratory (P.F.), the European Research Council grant 617306 (J.L.M.), the Chinese 863 Program 2012AA02A201, the National Basic Research program of China 973 program no. 2011CB809201, 2011CB809202 and 2011CB809203, Natural Science Foundation of China 31161130357, the Shenzhen Municipal Government of China grant ZYC201105170397A (J.W.), the Canadian Institutes of Health Research Operating grant 136855 and Canada Research Chair (S.G.), Banting Postdoctoral Fellowship from the Canadian Institutes of Health Research (M.K.D.), a Le Fonds de Recherche duQuébec-Santé (FRQS) research fellowship (A.H.), Genome Quebec (P.A.), the Ontario Ministry of Research and Innovation – Ontario Institute for Cancer Research Investigator Award (P.A., J.S.), the Quebec Ministry of Economic Development, Innovation, and Exports grant PSR-SIIRI-195 (P.A.), the German Federal Ministry of Education and Research (BMBF) grants 0315428A and 01GS08201 (R.H.), the Max Planck Society (H.L., G.M., R.S.), BMBF-EPITREAT grant 0316190A (R.H., M.L.), the German Research Foundation (Deutsche Forschungsgemeinschaft) Emmy Noether Grant KO4037/1-1 (J.O.K.), the Beatriu de Pinos Program grants 2006 BP-A 10144 and 2009 BP-B 00274 (M.V.), the Spanish National Institute for Health Research grant PRB2 IPT13/0001-ISCIII-SGEFI/FEDER (A.O.), Ewha Womans University (C.L.), the Japan Society for the Promotion of Science Fellowship number PE13075 (N.P.), the Louis Jeantet Foundation (E.T.D.), the Marie Curie Actions Career Integration grant 303772 (C.A.), the Swiss National Science Foundation 31003A_130342 and NCCR “Frontiers in Genetics” (E.T.D.), the University of Geneva (E.T.D., T.L., G.M.), the US National Institutes of Health National Center for Biotechnology Information (S.S.) and grants U54HG3067 (E.S.L.), U54HG3273 and U01HG5211 (R.A.G.), U54HG3079 (R.K.W., E.R.M.), R01HG2898 (S.E.D.), R01HG2385 (E.E.E.), RC2HG5552 and U01HG6513 (G.T.M., G.R.A.), U01HG5214 (A.C.), U01HG5715 (C.D.B.), U01HG5718 (M.G.), U01HG5728 (Y.X.F.), U41HG7635 (R.K.W., E.E.E., P.H.S.), U41HG7497 (C.L., M.A.B., K.C., L.D., E.E.E., M.G., J.O.K., G.T.M., S.A.M., R.E.M., J.L.S., K.Y.), R01HG4960 and R01HG5701 (B.L.B.), R01HG5214 (G.A.), R01HG6855 (S.M.), R01HG7068 (R.E.M.), R01HG7644 (R.D.H.), DP2OD6514 (P.S.), DP5OD9154 (J.K.), R01CA166661 (S.E.D.), R01CA172652 (K.C.), P01GM99568 (S.R.B.), R01GM59290 (L.B.J., M.A.B.), R01GM104390 (L.B.J., M.Y.Y.), T32GM7790 (C.D.B., A.R.M.), P01GM99568 (S.R.B.), R01HL87699 and R01HL104608 (K.C.B.), T32HL94284 (J.L.R.F.), and contracts HHSN268201100040C (A.M.R.) and HHSN272201000025C (P.S.), Harvard Medical School Eleanor and Miles Shore Fellowship (K.L.), Lundbeck Foundation Grant R170-2014-1039 (K.L.), NIJ Grant 2014-DN-BX-K089 (Y.E.), the Mary Beryl Patch Turnbull Scholar Program (K.C.B.), NSF Graduate Research Fellowship DGE-1147470 (G.D.P.), the Simons Foundation SFARI award SF51 (M.W.), and a Sloan Foundation Fellowship (R.D.H.). E.E.E. is an investigator of the Howard Hughes Medical Institute

Cold Spring Harbor Laboratory Institutional Repository

Bilkent University Institutional Repository

Serveur académique lausannois

Louisiana State University

Carolina Digital Repository

Spiral - Imperial College Digital Repository

Online Research Database In Technology

MPG.PuRe

Brunel University Research Archive

HKU Scholars Hub

University of Queensland eSpace

Crossref