Search CORE

134 research outputs found

Algorithms for genomics and genetics : compression-accelerated search and admixture analysis

Author: Loh Po-Ru
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Mathematics, 2013.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 133-139).Rapid advances in next-generation sequencing technologies are revolutionizing genomics, with data sets at the scale of thousands of human genomes fast becoming the norm. These technological leaps promise to enable corresponding advances in biology and medicine, but the deluge of raw data poses substantial mathematical, computational and statistical challenges that must first be overcome. This thesis consists of two research thrusts along these lines. First, we propose an algorithmic framework, "compressive genomics," that accelerates bioinformatic computations through analysis-aware compression. We demonstrate this methodology with proof-of-concept implementations of compression-accelerated search (CaBLAST and CaBLAT). Second, we develop new computational tools for investigating population admixture, a phenomenon of importance in understanding demographic histories of human populations and facilitating association mapping of disease genes. Our recently released ALDER and MixMapper software packages provide fast, sensitive, and robust methods for detecting and analyzing signatures of admixture created by genetic drift and recombination on genome-wide, large-sample scales.by Po-Ru Loh.Ph.D

DSpace@MIT

Ancient west Eurasian ancestry in southern and eastern Africa

Author: Berger Bonnie
Lipson Mark
Loh Po-Ru
Pakendorf Brigitte
Patterson Nick
Pickrell Joseph K.
Reich David
Stoneking Mark
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/07/2013
Field of study

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter-gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger-Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to approximately 900-1,800 years ago, and show that it had the largest demographic impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to approximately 2,700 - 3,300 years ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa, and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.Comment: Added additional simulations, some additional discussio

arXiv.org e-Print Archive

Efficient Moment-Based Inference of Admixture Parameters and Sources of Gene Flow

Author: Berger Bonnie
Levin Alex
Lipson Mark
Loh Po-Ru
Patterson Nick
Reich David
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/04/2013
Field of study

The recent explosion in available genetic data has led to significant advances in understanding the demographic histories of and relationships among human populations. It is still a challenge, however, to infer reliable parameter values for complicated models involving many populations. Here, we present MixMapper, an efficient, interactive method for constructing phylogenetic trees including admixture events using single nucleotide polymorphism (SNP) genotype data. MixMapper implements a novel two-phase approach to admixture inference using moment statistics, first building an unadmixed scaffold tree and then adding admixed populations by solving systems of equations that express allele frequency divergences in terms of mixture parameters. Importantly, all features of the model, including topology, sources of gene flow, branch lengths, and mixture proportions, are optimized automatically from the data and include estimates of statistical uncertainty. MixMapper also uses a new method to express branch lengths in easily interpretable drift units. We apply MixMapper to recently published data for Human Genome Diversity Cell Line Panel individuals genotyped on a SNP array designed especially for use in population genetics studies, obtaining confident results for 30 populations, 20 of them admixed. Notably, we confirm a signal of ancient admixture in European populations—including previously undetected admixture in Sardinians and Basques—involving a proportion of 20–40% ancient northern Eurasian ancestry

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

Recommended from our members

A model and test for coordinated polygenic epistasis in complex traits

Author: Dahl Andy
Loh Po-Ru
Rappoport Nadav
Sanders Stephan J.
Sheppard Brooke
Zaitlen Noah
Publication venue
Publication date: 09/11/2023
Field of study

Interactions between genetic variants—epistasis—is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue–trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits

Knowledge UChicago

Recommended from our members

Fast and accurate long-range phasing in a UK Biobank cohort

Author: Loh Po-Ru
Palamara Pier Francesco
Price Alkes L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/01/2017
Field of study

Recent work has leveraged the extensive genotyping of the Icelandic population to perform long-range phasing (LRP), enabling accurate imputation and association analysis of rare variants in target samples typed on genotyping arrays. Here, we develop a fast and accurate LRP method, Eagle, that extends this paradigm to populations with much smaller proportions of genotyped samples by harnessing long (>4cM) identical-by-descent (IBD) tracts shared among distantly related individuals. We applied Eagle to N≈150,000 samples (0.2% of the British population) from the UK Biobank, and we determined that it is 1–2 orders of magnitude faster than existing methods while achieving similar or better phasing accuracy (switch error rate ≈0.3%, corresponding to perfect phase in a majority of 10Mb segments). We also observed that when used within an imputation pipeline, Eagle pre-phasing improved downstream imputation accuracy compared to pre-phasing in batches using existing methods (as necessary to achieve comparable computational cost)

Harvard University - DASH

Making polygons by simple folds and one straight cut

Author: Demaine Erik D.
Demaine Martin L.
Hawksley Andrea
Ito Hiro
Loh Po-Ru
Manber Shelly
Stephens Omari S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Computational Geometry, Graphs and Applications 9th International Conference, CGGA 2010, Dalian, China, November 3-6, 2010, Revised Selected PapersWe give an efficient algorithmic characterization of simple polygons whose edges can be aligned onto a common line, with nothing else on that line, by a sequence of all-layers simple folds. In particular, such alignments enable the cutting out of the polygon and its complement with one complete straight cut. We also show that these makeable polygons include all convex polygons possessing a line of symmetry

CiteSeerX

DSpace@MIT

Crossref

Reconstructing Austronesian population history in Island Southeast Asia

Author: Berger Bonnie
Ko Ying-Chin
Lipson Mark
Loh Po-Ru
Moorjani Priya
Patterson Nick
Reich David
Stoneking Mark
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Austronesian languages are spread across half the globe, from Easter Island to Madagascar. Evidence from linguistics and archaeology indicates that the ‘Austronesian expansion,’ which began 4,000–5,000 years ago, likely had roots in Taiwan, but the ancestry of present-day Austronesian-speaking populations remains controversial. Here, we analyse genome-wide data from 56 populations using new methods for tracing ancestral gene flow, focusing primarily on Island Southeast Asia. We show that all sampled Austronesian groups harbour ancestry that is more closely related to aboriginal Taiwanese than to any present-day mainland population. Surprisingly, western Island Southeast Asian populations have also inherited ancestry from a source nested within the variation of present-day populations speaking Austro-Asiatic languages, which have historically been nearly exclusive to the mainland. Thus, either there was once a substantial Austro-Asiatic presence in Island Southeast Asia, or Austronesian speakers migrated to and through the mainland, admixing there before continuing to western Indonesia

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California

MPG.PuRe

Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes

Author: Berger Bonnie
Lipson Mark
Loh Po-Ru
Patterson Nick
Reich David
Sankararaman Sriram
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2015
Field of study

The human mutation rate is an essential parameter for studying the evolution of our species, interpreting present-day genetic variation, and understanding the incidence of genetic disease. Nevertheless, our current estimates of the rate are uncertain. Most notably, recent approaches based on counting de novo mutations in family pedigrees have yielded significantly smaller values than classical methods based on sequence divergence. Here, we propose a new method that uses the fine-scale human recombination map to calibrate the rate of accumulation of mutations. By comparing local heterozygosity levels in diploid genomes to the genetic distance scale over which these levels change, we are able to estimate a long-term mutation rate averaged over hundreds or thousands of generations. We infer a rate of 1.61 ± 0.13 × 10−8 mutations per base per generation, which falls in between phylogenetic and pedigree-based estimates, and we suggest possible mechanisms to reconcile our estimate with previous studies. Our results support intermediate-age divergences among human populations and between humans and other great apes

DSpace@MIT

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The Francis Crick Institute

Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium

Author: Bonnie Berger
Chikhi
David Reich
Hammer
Joseph K. Pickrell
Mark Lipson
Nick Patterson
Ohta
Po-Ru Loh
Priya Moorjani
Wang
Publication venue: 'Genetics Society of America'
Publication date: 01/10/2012
Field of study

Author Manuscript date February 9, 2013Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.National Science Foundation (U.S.). Graduate Research Fellowship ProgramNational Institutes of Health (U.S.). (Training Grant 5T32HG004947-04)Simons Foundatio

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California