20 research outputs found

    Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

    Get PDF
    Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone

    African-specific alleles modify risk for asthma at the 17q12-q21 locus in African Americans

    Get PDF
    BACKGROUND: Asthma is the most common chronic disease in children, occurring at higher frequencies and with more severe disease in children with African ancestry. METHODS: We tested for association with haplotypes at the most replicated and significant childhood-onset asthma locus at 17q12-q21 and asthma in European American and African American children. Following this, we used whole-genome sequencing data from 1060 African American and 100 European American individuals to identify novel variants on a high-risk African American-specific haplotype. We characterized these variants in silico using gene expression and ATAC-seq data from airway epithelial cells, functional annotations from ENCODE, and promoter capture (pc)Hi-C maps in airway epithelial cells. Candidate causal variants were then assessed for correlation with asthma-associated phenotypes in African American children and adults. RESULTS: Our studies revealed nine novel African-specific common variants, enriched on a high-risk asthma haplotype, which regulated the expression of GSDMA in airway epithelial cells and were associated with features of severe asthma. Using ENCODE annotations, ATAC-seq, and pcHi-C, we narrowed the associations to two candidate causal variants that are associated with features of T2 low severe asthma. CONCLUSIONS: Previously unknown genetic variation at the 17q12-21 childhood-onset asthma locus contributes to asthma severity in individuals with African ancestries. We suggest that many other population-specific variants that have not been discovered in GWAS contribute to the genetic risk for asthma and other common diseases

    The first horse herders and the impact of early Bronze Age steppe expansions into Asia.

    Get PDF
    The Yamnaya expansions from the western steppe into Europe and Asia during the Early Bronze Age (~3000 BCE) are believed to have brought with them Indo-European languages and possibly horse husbandry. We analyzed 74 ancient whole-genome sequences from across Inner Asia and Anatolia and show that the Botai people associated with the earliest horse husbandry derived from a hunter-gatherer population deeply diverged from the Yamnaya. Our results also suggest distinct migrations bringing West Eurasian ancestry into South Asia before and after, but not at the time of, Yamnaya culture. We find no evidence of steppe ancestry in Bronze Age Anatolia from when Indo-European languages are attested there. Thus, in contrast to Europe, Early Bronze Age Yamnaya-related migrations had limited direct genetic impact in Asia

    Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits

    No full text
    Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007−0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels

    African-specific alleles modify risk for asthma at the 17q12-q21 locus in African Americans

    No full text
    BACKGROUND: Asthma is the most common chronic disease in children, occurring at higher frequencies and with more severe disease in children with African ancestry. METHODS: We tested for association with haplotypes at the most replicated and significant childhood-onset asthma locus at 17q12-q21 and asthma in European American and African American children. Following this, we used whole-genome sequencing data from 1060 African American and 100 European American individuals to identify novel variants on a high-risk African American-specific haplotype. We characterized these variants in silico using gene expression and ATAC-seq data from airway epithelial cells, functional annotations from ENCODE, and promoter capture (pc)Hi-C maps in airway epithelial cells. Candidate causal variants were then assessed for correlation with asthma-associated phenotypes in African American children and adults. RESULTS: Our studies revealed nine novel African-specific common variants, enriched on a high-risk asthma haplotype, which regulated the expression of GSDMA in airway epithelial cells and were associated with features of severe asthma. Using ENCODE annotations, ATAC-seq, and pcHi-C, we narrowed the associations to two candidate causal variants that are associated with features of T2 low severe asthma. CONCLUSIONS: Previously unknown genetic variation at the 17q12-21 childhood-onset asthma locus contributes to asthma severity in individuals with African ancestries. We suggest that many other population-specific variants that have not been discovered in GWAS contribute to the genetic risk for asthma and other common diseases

    The Genetics of Bene Israel from India Reveals Both Substantial Jewish and Indian Ancestry

    No full text
    <div><p>The Bene Israel Jewish community from West India is a unique population whose history before the 18<sup>th</sup> century remains largely unknown. Bene Israel members consider themselves as descendants of Jews, yet the identity of Jewish ancestors and their arrival time to India are unknown, with speculations on arrival time varying between the 8th century BCE and the 6th century CE. Here, we characterize the genetic history of Bene Israel by collecting and genotyping 18 Bene Israel individuals. Combining with 486 individuals from 41 other Jewish, Indian and Pakistani populations, and additional individuals from worldwide populations, we conducted comprehensive genome-wide analyses based on F<sub>ST</sub>, principal component analysis, ADMIXTURE, identity-by-descent sharing, admixture linkage disequilibrium decay, haplotype sharing and allele sharing autocorrelation decay, as well as contrasted patterns between the X chromosome and the autosomes. The genetics of Bene Israel individuals resemble local Indian populations, while at the same time constituting a clearly separated and unique population in India. They are unique among Indian and Pakistani populations we analyzed in sharing considerable genetic ancestry with other Jewish populations. Putting together the results from all analyses point to Bene Israel being an admixed population with both Jewish and Indian ancestry, with the genetic contribution of each of these ancestral populations being substantial. The admixture took place in the last millennium, about 19–33 generations ago. It involved Middle-Eastern Jews and was sex-biased, with more male Jewish and local female contribution. It was followed by a population bottleneck and high endogamy, which can lead to increased prevalence of recessive diseases in this population. This study provides an example of how genetic analysis advances our knowledge of human history in cases where other disciplines lack the relevant data to do so.</p></div

    Founder events in Bene Israel population.

    No full text
    <p>(A) Total lengths of runs of homozygosity (ROH) in Jewish, Indian and HapMap populations. The larger variance in ROH values in some Indian populations is due to smaller sample size. (B-C) Autocorrelation in Bene Israel pairs, as a function of the genetic distance, after subtracting the autocorrelation between Bene Israel and other (B) Jewish and (C) Indian populations. Blue and red lines correspond to the fitted curve based on a single and two founder events, respectively.</p
    corecore