Search CORE

arXiv.org e-Print Archive

RCSI Repository

Genetic Classification of Populations using Supervised Learning

Author: A Motsinger-Reif
A Seretti
Aiden Corvin
B North
C Bailer-Jones
C Chang
Carlos Pinto
Colm O'Dushlaine
D Curtis
D Reich
D Reich
Daniel J. Kliebenstein
Derek Morris
E Jaynes
Elizabeth A. Heron
J Baik
J Baik
M Leshno
M Nelis
Michael Bridges
Michael Gill
N Patterson
O Lao
Ricardo Segurado
S Gull
S Penco
S Purcell
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/12/2010
Field of study

There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case--control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed \emph{unsupervised}. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.Comment: Accepted PLOS On

Aberdeen University Research

CiteSeerX

Online Research @ Cardiff

Research Repository UCD

Irish Universities

UCL Discovery

University of Melbourne Institutional Repository

Iron Age and Anglo-Saxon genomes from East England reveal British migration history

Author: A Sajantila
AL Topf
AW Briggs
B Winney
C Capelli
CT O'Dushlaine
D Petts
G Jun
GB Busby
H Eckardt
H Härke
H Jònsson
H Li
H Li
HX Zheng
I Lazaridis
J Hines
J Montgomery
J Novembre
JK Pickrell
M Meyer
M Schubert
ME Weale
MG Thomas
N Patterson
N Rohland
P Balaresque
P Brotherton
P Budd
P Ralph
P Skoglund
PR Staab
S Besenbacher
S Leslie
S Schiffels
T Sundell
W Haak
Wellcome Trust
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

British population history has been shaped by a series of immigrations, including the early Anglo-Saxon migrations after 400 CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences from 10 individuals excavated close to Cambridge in the East of England, ranging from the late Iron Age to the middle Anglo-Saxon period. By analysing shared rare variants with hundreds of modern samples from Britain and Europe, we estimate that on average the contemporary East English population derives 38% of its ancestry from Anglo-Saxon migrations. We gain further insight with a new method, rarecoal, which infers population history and identifies fine-scale genetic ancestry from rare variants. Using rarecoal we find that the Anglo-Saxon samples are closely related to modern Dutch and Danish populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain

CLoK

Adelaide Research & Scholarship

MPG.PuRe

Multiplex Target Enrichment Using DNA Indexing for Ultra-High Throughput SNP Detection

Author: A. P. Corvin
A. S. Gates
Albert
C. Pinto
C. T. O'Dushlaine
Craig
D. W. Morris
Dahl
E. M. Kenny
Gnirke
Hodges
Landegren
Lovett
M. Gill
Mamanova
Ng
P. Cormican
Porreca
Rodriguez
Tewhey
Tewhey
Turner
W. P. Gilks
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Screening large numbers of target regions in multiple DNA samples for sequence variation is an important application of next-generation sequencing but an efficient method to enrich the samples in parallel has yet to be reported. We describe an advanced method that combines DNA samples using indexes or barcodes prior to target enrichment to facilitate this type of experiment. Sequencing libraries for multiple individual DNA samples, each incorporating a unique 6-bp index, are combined in equal quantities, enriched using a single in-solution target enrichment assay and sequenced in a single reaction. Sequence reads are parsed based on the index, allowing sequence analysis of individual samples. We show that the use of indexed samples does not impact on the efficiency of the enrichment reaction. For three- and nine-indexed HapMap DNA samples, the method was found to be highly accurate for SNP identification. Even with sequence coverage as low as 8x, 99% of sequence SNP calls were concordant with known genotypes. Within a single experiment, this method can sequence the exonic regions of hundreds of genes in tens of samples for sequence and structural variation using as little as 1 μg of input DNA per sample

CiteSeerX

arXiv.org e-Print Archive

Sussex Research Online

The geography of recent genetic ancestry across Europe

Author: A Albrechtsen
A Auton
A Gillett
A Gusev
A Keller
A Zeileis
AE Hoerl
AL Price
AL Price
AM Stuart
B Winney
BL Browning
BM Henn
BM Henn
C Tyler-Smith
CD Huff
Chris Tyler-Smith
CL Epstein
CT O'Dushlaine
DJ Lawson
DLT Rohde
E Jakkula
F Rousset
G McVean
Graham Coop
H Li
J Chang
J Novembre
J Novembre
J Novembre
JA Tennessen
JE Pool
JE Powell
JFC Kingman
JK Gusev Lowe
JN Fenner
K Harris
KA Frazer
KP Donnelly
M Slatkin
MD Brown
MR Nelson
MR Nelson
N Patterson
N Patterson
N Takahata
NH Chapman
O Lao
P Menozzi
P Moorjani
P Skoglund
P Soares
Peter Ralph
PF Palamara
R Hudson
RA Fisher
RL Cann
S Carmi
S Giglio
S Gravel
S Purcell
Y Petrov
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/05/2013
Field of study

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.Comment: Full size figures available from http://www.eve.ucdavis.edu/~plralph/research.html; or html version at http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm

Queen's University Belfast Research Portal

FigShare

Evidence that duplications of 22q11.2 protect against schizophrenia.

Author: A Corvin
A L Richards
A Sanders
AE Pulver
AS Bassett
AS Bassett
B Riley
C Hultman
C O'Dushlaine
C Pato
C Wentzel
D Levinson
D Malhotra
D Morris
E K Green
E Rees
EB Kaminsky
F A O'Neill
G Davies
G Genovese
G Kirov
H H H Göring
I Jones
J Duan
J L Moran
J Shi
J Szatkiewicz
J T R Walters
JA Rosenfeld
K D Chambert
K Devriendt
K S Kendler
K Wang
KC Murphy
M C O'Donovan
M Gill
M J Owen
M Karayiorgou
M Pato
MF Portnoï
MJ Owen
ML Hamshere
N Craddock
N Hiroi
NJM van Beveren
P Cormican
P F Sullivan
P Sklar
P V Gejman
RJ Shprintzen
RJ Shprintzen
S A McCarroll
S E Legge
S Van Campenhout
TM Yobb
W Moy
Z Ou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/11/2013
Field of study

A number of large, rare copy number variants (CNVs) are deleterious for neurodevelopmental disorders, but large, rare, protective CNVs have not been reported for such phenotypes. Here we show in a CNV analysis of 47 005 individuals, the largest CNV analysis of schizophrenia to date, that large duplications (1.5-3.0 Mb) at 22q11.2--the reciprocal of the well-known, risk-inducing deletion of this locus--are substantially less common in schizophrenia cases than in the general population (0.014% vs 0.085%, OR=0.17, P=0.00086). 22q11.2 duplications represent the first putative protective mutation for schizophrenia

Online Research @ Cardiff

Plymouth Electronic Archive and Research Library

Carolina Digital Repository

Multi-locus genome-wide association analysis supports the role of glutamatergic synaptic transmission in the etiology of major depressive disorder

Author: A Demirkan
A Segrè
A Terracciano
AE Autry
AM Linden
BL Browning
BP Brennan
C Li
C O'Dushlaine
CM Lewis
CX Li
D Zelena
DD Schoepp
DS Hasin
ESCL Lips
F Lohoff
FJ Bosker
G Breen
G Salvadore
G Sanacora
J Shi
JB Veyrieras
JT Glessner
K Hashimoto
K Mitsukawa
K Wang
K Wang
M Kohli
M Pergadia
M Rietschel
MD Li
MJ Robbins
NR Wray
P Holmans
P Muglia
PF Sullivan
PI de Bakker
R Bernard
R Machado-Vieira
RM Duvoisin
S Maeng
S Purcell
S Purcell
S Raychaudhuri
SA Hamilton
SE Medland
SI Shyn
SJ Mathew
T Goltser-Dubner
V Duric
V Krishnan
Y Li
Y Miyamoto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Major depressive disorder (MDD) is a common psychiatric illness characterized by low mood and loss of interest in pleasurable activities. Despite years of effort, recent genome-wide association studies (GWAS) have identified few susceptibility variants or genes that are robustly associated with MDD. Standard single-SNP (single nucleotide polymorphism)-based GWAS analysis typically has limited power to deal with the extensive heterogeneity and substantial polygenic contribution of individually weak genetic effects underlying the pathogenesis of MDD. Here, we report an alternative, gene-set-based association analysis of MDD in an effort to identify groups of biologically related genetic variants that are involved in the same molecular function or cellular processes and exhibit a significant level of aggregated association with MDD. In particular, we used a text-mining-based data analysis to prioritize candidate gene sets implicated in MDD and conducted a multi-locus association analysis to look for enriched signals of nominally associated MDD susceptibility loci within each of the gene sets. Our primary analysis is based on the meta-analysis of three large MDD GWAS data sets (total N = 4346 cases and 4430 controls). After correction for multiple testing, we found that genes involved in glutamatergic synaptic neurotransmission were significantly associated with MDD (set-based association P = 6.9 X 10(-4)). This result is consistent with previous studies that support a role of the glutamatergic system in synaptic plasticity and MDD and support the potential utility of targeting glutamatergic neurotransmission in the treatment of MDD

Harvard University - DASH

VU Research Portal

University of Queensland eSpace

Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference

Springer - Publisher Connector

Public Library of Science (PLOS)

Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples

Author: A Albrechtsen
A Auton
A Gusev
A Kitchen
A Kong
A Price
B Derrida
B McEvoy
BL Browning
BM Henn
Brenna M. Henn
C O'Dushlaine
CD Huff
CR Gignoux
D Behar
D Rohde
FS Alkuraya
G Atzmon
G Leibon
G Malecot
Henry Harpending
I Moltke
Itsik Pe'er
J Li
J Novembre
J. Michael Macpherson
JL Mountain
JM Macpherson
Joanna L. Mountain
L Scott
L Weiss
Lawrence Hon
M Epstein
M Kirin
M Nalls
M Slatkin
M Zlojutro
N Rosenberg
N Rosenberg
N Rosenberg
Nick Eriksson
R McQuillan
RR Hudson
S Browning
S Ramachandran
S Tishkoff
S Wang
Serge Saxonov
SR Browning
W Bodmer
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies

CiteSeerX