777 research outputs found

    1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana

    Get PDF
    Arabidopsis thaliana serves as a model organism for the study of fundamental physiological, cellular, and molecular processes. It has also greatly advanced our understanding of intraspecific genome variation. We present a detailed map of variation in 1,135 highquality re-sequenced natural inbred lines representing the native Eurasian and North African range and recently colonized North America. We identify relict populations that continue to inhabit ancestral habitats, primarily in the Iberian Peninsula. They have mixed with a lineage that has spread to northern latitudes from an unknown glacial refugium and is now found in a much broader spectrum of habitats. Insights into the history of the species and the finescale distribution of genetic diversity provide the basis for full exploitation of A. thaliana natural variation through integration of genomes and epigenomes with molecular and non-molecular phenotypes

    An integrated map of genetic variation from 1,092 human genomes

    Get PDF
    By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations

    A map of human genome variation from population-scale sequencing

    Get PDF
    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research

    A global reference for human genetic variation

    Get PDF
    The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies

    In silico karyotyping of chromosomally polymorphic malaria mosquitoes in the Anopheles gambiae complex

    Get PDF
    Chromosomal inversion polymorphisms play an important role in adaptation to environmental heterogeneities. For mosquito species in the Anopheles gambiae complex that are significant vectors of human malaria, paracentric inversion polymorphisms are abundant and are associated with ecologically and epidemiologically important phenotypes. Improved understanding of these traits relies on determining mosquito karyotype, which currently depends upon laborious cytogenetic methods whose application is limited both by the requirement for specialized expertise and for properly preserved adult females at specific gonotrophic stages. To overcome this limitation, we developed sets of tag single nucleotide polymorphisms (SNPs) inside inversions whose biallelic genotype is strongly correlated with inversion genotype. We leveraged 1,347 fully sequenced An. gambiae and Anopheles coluzzii genomes in the Ag1000G database of natural variation. Beginning with principal components analysis (PCA) of population samples, applied to windows of the genome containing individual chromosomal rearrangements, we classified samples into three inversion genotypes, distinguishing homozygous inverted and homozygous uninverted groups by inclusion of the small subset of specimens in Ag1000G that are associated with cytogenetic metadata. We then assessed the correlation between candidate tag SNP genotypes and PCA-based inversion genotypes in our training sets, selecting those candidates with >80% agreement. Our initial tests both in held-back validation samples from Ag1000G and in data independent of Ag1000G suggest that when used for in silico inversion genotyping of sequenced mosquitoes, these tags perform better than traditional cytogenetics, even for specimens where only a small subset of the tag SNPs can be successfully ascertained

    HapZipper: sharing HapMap populations just got easier

    Get PDF
    The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing H ap Z ipper , a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip , bzip2 and lzma . We demonstrate the usefulness of H ap Z ipper by compressing HapMap 3 populations to <5% of their original sizes. H ap Z ipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz

    A bi-objective feature selection algorithm for large omics datasets

    Get PDF
    Special Issue: Fourth special issue on knowledge discovery and business intelligence.Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the bi-objective version of the algorithm Logical Analysis of Inconsistent Data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The bi-objective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.The authors would like to thank the FCT support UID/Multi/04046/2013. This work used the EGI, European Grid Infrastructure, with the support of the IBERGRID, Iberian Grid Infrastructure, and INCD (Portugal).info:eu-repo/semantics/publishedVersio

    Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways

    Get PDF
    Depression is a polygenic trait that causes extensive periods of disability. Previous genetic studies have identified common risk variants which have progressively increased in number with increasing sample sizes of the respective studies. Here, we conduct a genome-wide association study in 322,580 UK Biobank participants for three depression-related phenotypes: broad depression, probable major depressive disorder (MDD), and International Classification of Diseases (ICD, version 9 or 10)-coded MDD. We identify 17 independent loci that are significantly associated (P &lt; 5 × 10−8) across the three phenotypes. The direction of effect of these loci is consistently replicated in an independent sample, with 14 loci likely representing novel findings. Gene sets are enriched in excitatory neurotransmission, mechanosensory behaviour, post synapse, neuron spine and dendrite functions. Our findings suggest that broad depression is the most tractable UK Biobank phenotype for discovering genes and gene sets that further our understanding of the biological pathways underlying depression

    Sequence data of six unusual alleles at SE33 and D1S1656 STR Loci

    Get PDF
    When profiling a reference dataset of 500 DNA samples for the population of Saudi Arabia, using the GlobalFiler® PCR amplification kit, six unusual alleles were detected. At the SE33 locus, four novel alleles were found: 2, 14.3, 20.3, and 38; two alleles, at the D1S1656 locus: 7 and 8, had been previously reported, but no published sequence data was available. The D1S1656 alleles were sequenced using ForenSeq™ DNA Signature Prep with the MiSeq FGx System (Illumina, USA). As the SE33 is not reported by available Massively Parallel Sequencing (MPS) systems, samples that exhibited the unreported alleles were sequenced using BigDye™ Terminator v3.1 Cycle Sequencing Kit. Here we present the sequence and structure of the previously uncharacterized alleles
    corecore