985 research outputs found
Joint analysis of functional genomic data and genome-wide association studies of 18 human traits
Annotations of gene structures and regulatory elements can inform genome-wide
association studies (GWAS). However, choosing the relevant annotations for
interpreting an association study of a given trait remains challenging. We
describe a statistical model that uses association statistics computed across
the genome to identify classes of genomic element that are enriched or depleted
for loci that influence a trait. The model naturally incorporates multiple
types of annotations. We applied the model to GWAS of 18 human traits,
including red blood cell traits, platelet traits, glucose levels, lipid levels,
height, BMI, and Crohn's disease. For each trait, we evaluated the relevance of
450 different genomic annotations, including protein-coding genes, enhancers,
and DNase-I hypersensitive sites in over a hundred tissues and cell lines. We
show that the fraction of phenotype-associated SNPs that influence protein
sequence ranges from around 2% (for platelet volume) up to around 20% (for LDL
cholesterol); that repressed chromatin is significantly depleted for SNPs
associated with several traits; and that cell type-specific DNase-I
hypersensitive sites are enriched for SNPs associated with several traits (for
example, the spleen in platelet volume). Finally, by re-weighting each GWAS
using information from functional genomics, we increase the number of loci with
high-confidence associations by around 5%.Comment: Fixed typos, included minor clarification
Recommended from our members
Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data
Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and “ancient” Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.</p
Ancient west Eurasian ancestry in southern and eastern Africa
The history of southern Africa involved interactions between indigenous
hunter-gatherers and a range of populations that moved into the region. Here we
use genome-wide genetic data to show that there are at least two admixture
events in the history of Khoisan populations (southern African hunter-gatherers
and pastoralists who speak non-Bantu languages with click consonants). One
involved populations related to Niger-Congo-speaking African populations, and
the other introduced ancestry most closely related to west Eurasian (European
or Middle Eastern) populations. We date this latter admixture event to
approximately 900-1,800 years ago, and show that it had the largest demographic
impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal
of west Eurasian ancestry is present throughout eastern Africa. In particular,
we also find evidence for two admixture events in the history of Kenyan,
Tanzanian, and Ethiopian populations, the earlier of which involved populations
related to west Eurasians and which we date to approximately 2,700 - 3,300
years ago. We reconstruct the allele frequencies of the putative west Eurasian
population in eastern Africa, and show that this population is a good proxy for
the west Eurasian ancestry in southern Africa. The most parsimonious
explanation for these findings is that west Eurasian ancestry entered southern
Africa indirectly through eastern Africa.Comment: Added additional simulations, some additional discussio
Recommended from our members
Noisy Splicing Drives mRNA Isoform Diversity in Human Cells
While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.</p
Ewens measures on compact groups and hypergeometric kernels
On unitary compact groups the decomposition of a generic element into product
of reflections induces a decomposition of the characteristic polynomial into a
product of factors. When the group is equipped with the Haar probability
measure, these factors become independent random variables with explicit
distributions. Beyond the known results on the orthogonal and unitary groups
(O(n) and U(n)), we treat the symplectic case. In U(n), this induces a family
of probability changes analogous to the biassing in the Ewens sampling formula
known for the symmetric group. Then we study the spectral properties of these
measures, connected to the pure Fisher-Hartvig symbol on the unit circle. The
associated orthogonal polynomials give rise, as tends to infinity to a
limit kernel at the singularity.Comment: New version of the previous paper "Hua-Pickrell measures on general
compact groups". The article has been completely re-written (the presentation
has changed and some proofs have been simplified). New references added
Recommended from our members
Identifying genetic variants that affect viability in large cohorts
A number of open questions in human evolutionary genetics would become tractable if we were able to directly measure evolutionary fitness. As a step towards this goal, we developed a method to examine whether individual genetic variants, or sets of genetic variants, currently influence viability. The approach consists in testing whether the frequency of an allele varies across ages, accounting for variation in ancestry. We applied it to the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort and to the parents of participants in the UK Biobank. Across the genome, we found only a few common variants with large effects on age-specific mortality: tagging the APOE ε4 allele and near CHRNA3. These results suggest that when large, even late-onset effects are kept at low frequency by purifying selection. Testing viability effects of sets of genetic variants that jointly influence 1 of 42 traits, we detected a number of strong signals. In participants of the UK Biobank of British ancestry, we found that variants that delay puberty timing are associated with a longer parental life span (P~6.2 × 10-6 for fathers and P~2.0 × 10-3 for mothers), consistent with epidemiological studies. Similarly, variants associated with later age at first birth are associated with a longer maternal life span (P~1.4 × 10-3). Signals are also observed for variants influencing cholesterol levels, risk of coronary artery disease (CAD), body mass index, as well as risk of asthma. These signals exhibit consistent effects in the GERA cohort and among participants of the UK Biobank of non-British ancestry. We also found marked differences between males and females, most notably at the CHRNA3 locus, and variants associated with risk of CAD and cholesterol levels. Beyond our findings, the analysis serves as a proof of principle for how upcoming biomedical data sets can be used to learn about selection effects in contemporary humans.Medical Research Council (Unit Programme number MC_UU_12015/2). This grant supported FRD and JRBP. National Institutes of Health (NIH) (grant number R01GM121372). This grant is to MP and JKP. National Institutes of Health (NIH) (grant number R01MH106842). This grant is to JKP. Columbia University. This research was funded in part by a Research Initiative in Science and Engineering grant to MP and JKP. National Institutes of Health (NIH) (grant number R01GM115889). This grant is to Guy Sella, provided partial support for HM
Unitary Representations of Unitary Groups
In this paper we review and streamline some results of Kirillov, Olshanski
and Pickrell on unitary representations of the unitary group \U(\cH) of a
real, complex or quaternionic separable Hilbert space and the subgroup
\U_\infty(\cH), consisting of those unitary operators for which g - \1
is compact. The Kirillov--Olshanski theorem on the continuous unitary
representations of the identity component \U_\infty(\cH)_0 asserts that they
are direct sums of irreducible ones which can be realized in finite tensor
products of a suitable complex Hilbert space. This is proved and generalized to
inseparable spaces. These results are carried over to the full unitary group by
Pickrell's Theorem, asserting that the separable unitary representations of
\U(\cH), for a separable Hilbert space \cH, are uniquely determined by
their restriction to \U_\infty(\cH)_0. For the classical infinite rank
symmetric pairs of non-unitary type, such as (\GL(\cH),\U(\cH)), we
also show that all separable unitary representations are trivial.Comment: 42 page
Schwinger Terms and Cohomology of Pseudodifferential Operators
We study the cohomology of the Schwinger term arising in second quantization
of the class of observables belonging to the restricted general linear algebra.
We prove that, for all pseudodifferential operators in 3+1 dimensions of this
type, the Schwinger term is equivalent to the ``twisted'' Radul cocycle, a
modified version of the Radul cocycle arising in non-commutative differential
geometry. In the process we also show how the ordinary Radul cocycle for any
pair of pseudodifferential operators in any dimension can be written as the
phase space integral of the star commutator of their symbols projected to the
appropriate asymptotic component.Comment: 19 pages, plain te
Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium
Author Manuscript date February 9, 2013Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.National Science Foundation (U.S.). Graduate Research Fellowship ProgramNational Institutes of Health (U.S.). (Training Grant 5T32HG004947-04)Simons Foundatio
- …