Search CORE

24 research outputs found

Functionally informed fine-mapping and polygenic localization of complex trait heritability

Author: Benner Christian
Cui Ran
Finucane Hilary K.
Gazal Steven
Hormozdiari Farhad
Marquez-Luna Carla
O'Connor Luke
Pirinen Matti
Price Alkes L.
Reshef Yakir
Schoech Armin P.
Ulirsch Jacob
van de Geijn Bryce
Weissbrod Omer
Publication venue
Publication date: 01/12/2020
Field of study

Fine-mapping aims to identify causal variants impacting complex traits. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy by leveraging functional annotations across the entire genome-not just genome-wide-significant loci-to specify prior probabilities for fine-mapping methods such as SuSiE or FINEMAP. In simulations, PolyFun + SuSiE and PolyFun + FINEMAP were well calibrated and identified >20% more variants with a posterior causal probability >0.95 than identified in their nonfunctionally informed counterparts. In analyses of 49 UK Biobank traits (average n = 318,000), PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement versus SuSiE. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures. PolyFun is a computationally scalable framework for functionally informed fine-mapping that makes full use of genome-wide data. It prioritizes more variants than previous methods when applied to 49 complex traits from UK Biobank.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Recommended from our members

School closure response to an influenza epidemic in AISD

Author: van de Geijn Bryce
Publication venue
Publication date: 01/01/2010
Field of study

Influenza epidemics cause costs to society in a number of ways. Work hours are lost directly when infected adults stay home from work and indirectly when infected kids miss school, forcing their parents to miss work. Infections also lead to a number of medical costs as well as costs in the form of deaths. The closing of schools is often used as a method to reduce the spread of epidemics. Closing schools reduces the contacts between kids and therefore reduces the number of infections. In this way the cost of the epidemic can be greatly reduced. However, closing schools is also very costly. When schools are closed, many adults are forced to miss work for child care. An optimal response to an outbreak of influenza minimizes the cost of influenza plus the cost of school closure. Araz et al develops a method of determining optimal school closure based on data for the entire state of Texas. However, the model they use divides the population into just two classes, adults and kids. Transmission is based on one giant pool for the whole state and closing schools means closing every school in the state. However, in practice there may be a spatial element to the epidemic, with certain areas having higher proportions infected. By associating the population with schools, this I allow the closing decision to be made on a school by school basis. This will allow for a more efficient selection of school closure policy.Biological Sciences, School o

Texas ScholarWorks

Recommended from our members

Genomic Tools for Robust Quantitative Trait Locus Discovery and Interpretation

Author: van de Geijn Bryce Myers
Publication venue: University of Chicago
Publication date: 27/10/2016
Field of study

Knowledge UChicago

Data from: Evolutionary history inferred from the de novo assembly of a non-model organism, the blue-eyed black lemur

Author: Kermany Amir R.
Meyer Wynn K.
Przeworski Molly
van de Geijn Bryce
Venkat Aarti
Zhang Sidi
Publication venue
Publication date: 22/07/2015
Field of study

Lemurs, the living primates most distantly related to humans, demonstrate incredible diversity in behaviour, life history patterns and adaptive traits. Although many lemur species are endangered within their native Madagascar, there is no high-quality genome assembly from this taxon, limiting population and conservation genetic studies. One critically endangered lemur is the blue-eyed black lemur Eulemur flavifrons. This species is fixed for blue irises, a convergent trait that evolved at least four times in primates and was subject to positive selection in humans, where 5′ regulatory variation of OCA2 explains most of the brown/blue eye colour differences. We built a de novo genome assembly for E. flavifrons, providing the most complete lemur genome to date, and a high confidence consensus sequence for close sister species E. macaco, the (brown-eyed) black lemur. From diversity and divergence patterns across the genomes, we estimated a recent split time of the two species (160 Kya) and temporal fluctuations in effective population sizes that accord with known environmental changes. By looking for regions of unusually low diversity, we identified potential signals of directional selection in E. flavifrons at MITF, a melanocyte development gene that regulates OCA2 and has previously been associated with variation in human iris colour, as well as at several other genes involved in melanin biosynthesis in mammals. Our study thus illustrates how whole-genome sequencing of a few individuals can illuminate the demographic and selection history of nonmodel species

ZENODO

Dryad Digital Repository (Duke University)

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Eulemur flavifrons fastq file for PSMC

Author: Aarti Venkat (136637)
Amir R. Kermany (3281805)
Bryce van de Geijn (631102)
Molly Przeworski (2483)
Sidi Zhang (3281802)
Wynn K. Meyer (136633)
Publication venue
Publication date: 22/07/2015
Field of study

This fastq file was generated from the all-sites vcf file for Harlow (the E. flavifrons individual used for the de novo assembly), and was used to infer historic Ne using PSMC (Li and Durbin 2011). The vcf was generated by running GATK (McKenna et al. 2010; DePristo et al. 2011) on the filtered bam file (mapping quality 20 or above, PE reads aligned to the same scaffold with correctly oriented read pairs mapping within three standard deviations of the mean, duplicates removed), with the EMIT_ALL_SITES option and a minimum base quality of 20. The fastq was generated from this all-sites vcf using the vcf2fq function within the vcfutils.pl script, using a minimum depth of 26 and a maximum depth of 104 (0.5 and 2x the mean depth, respectively), and a minimum quality (QUAL*GQ) of 20

Dryad Digital Repository (Duke University)

FigShare

Eulemur macaco fastq files for PSMC

Author: Aarti Venkat (136637)
Amir R. Kermany (3281805)
Bryce van de Geijn (631102)
Molly Przeworski (2483)
Sidi Zhang (3281802)
Wynn K. Meyer (136633)
Publication venue
Publication date: 22/07/2015
Field of study

These two files should be combined into one fastq representing the whole genome. The combined fastq file was generated from the all-sites vcf file for Harmonia (the E. macaco individual used for the de novo assembly), and was used to infer historic Ne using PSMC (Li and Durbin 2011). The vcf was generated by running GATK (McKenna et al. 2010; DePristo et al. 2011) on the filtered bam file (mapping quality 10 or above, duplicates removed), with the EMIT_ALL_SITES option and a minimum base quality of 20. The fastq was generated from this all-sites vcf using a modification of the vcf2fq function within the vcfutils.pl script (vcf2fqnonref, in https://github.com/sorrywm/genome_analysis/vcfutils_mod.pl), using a minimum depth of 10 and a maximum depth of 41 (0.5 and 2x the mean depth, respectively), and a minimum quality (QUAL*GQ) of 20

Dryad Digital Repository (Duke University)

FigShare

Eulemur macaco fastq files for PSMC

Author: Aarti Venkat (136637)
Amir R. Kermany (3281805)
Bryce van de Geijn (631102)
Molly Przeworski (2483)
Sidi Zhang (3281802)
Wynn K. Meyer (136633)
Publication venue
Publication date: 22/07/2015
Field of study

Dryad Digital Repository (Duke University)

FigShare

Dataset S1: Annotated transcripts in the 1% FST tail

Author: Aarti Venkat (136637)
Amir R. Kermany (3281805)
Bryce van de Geijn (631102)
Molly Przeworski (2483)
Sidi Zhang (3281802)
Wynn K. Meyer (136633)
Publication venue
Publication date: 22/07/2015
Field of study

This list contains all transcripts mapped to regions in the 1% tail of FST from the full dataset in either species. Columns represent Ensembl transcript ID, region of high FST where the transcript was annotated, start position of the transcript within the region, end position of the transcript within the region, proportion of the transcript mapped to the region, mean percent identity for parts of the transcript that mapped, Ensembl gene ID, and gene symbol. We annotated genes in the 1% tail of all 20 kb non-overlapping windows by aligning human protein sequences to the blue-eyed black lemur genome. We obtained protein sequences for human genome build hg18 and used TBLASTN version 2.2.22+ (Altschul et al. 1990, 1997), with an e-value threshold of 5 x 10-5 to identify orthologs within the regions of the blue-eyed black lemur reference genome corresponding to the 1% FST tail. We then took the list of all human proteins with hits within candidate regions and performed TBLASTN for these proteins against the entire lemur genome. We retained proteins whose best genome-wide match (containing the lowest e-value or maximum mean percent identity) for any subset of the protein sequence overlapped the candidate region. In cases in which multiple proteins mapped to the same location (>50% protein length overlapping, presumably representing multiple transcripts of the same gene or multiple genes in the same family), we retained the protein with the largest total length spanned by initial TBLASTN hits or the largest mean percent identity

Dryad Digital Repository (Duke University)

FigShare

Variable sites VCF files

Author: Aarti Venkat (136637)
Amir R. Kermany (3281805)
Bryce van de Geijn (631102)
Molly Przeworski (2483)
Sidi Zhang (3281802)
Wynn K. Meyer (136633)
Publication venue
Publication date: 22/07/2015
Field of study

These files contain sites with a high posterior probability (> 0.8) of being polymorphic in the 8-sample dataset (4 samples from each species), along with their genotype likelihoods in these samples as estimated by ANGSD (http://popgen.dk/wiki/index.php/ANGSD). Genotype likelihoods were estimated using the GATK model (-GL 2). The probability of a site being variable was estimated using ngsStat (https://github.com/mfumagalli/ngsTools), based on posterior probabilities calculated using the genotype likelihoods and the site frequency spectrum estimated from high quality sites (minimum mapping quality 1, minimum quality 20, minimum 3 samples with data, minimum depth 9) within each species

Dryad Digital Repository (Duke University)

FigShare

EfEmSangerSequencingForDryad

Author: Aarti Venkat (136637)
Amir R. Kermany (3281805)
Bryce van de Geijn (631102)
Molly Przeworski (2483)
Sidi Zhang (3281802)
Wynn K. Meyer (136633)
Publication venue
Publication date: 22/07/2015
Field of study

Sanger sequencing results for up to 16 individuals (9 blue-eyed black lemurs and 7 black lemurs from the Duke Lemur Center) sequenced at 17 amplicons throughout scaffold2503 (the scaffold containing the orthologs of HERC2/OCAs) and aligned using EBio

Dryad Digital Repository (Duke University)

FigShare