661 research outputs found

    1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana

    Get PDF
    Arabidopsis thaliana serves as a model organism for the study of fundamental physiological, cellular, and molecular processes. It has also greatly advanced our understanding of intraspecific genome variation. We present a detailed map of variation in 1,135 highquality re-sequenced natural inbred lines representing the native Eurasian and North African range and recently colonized North America. We identify relict populations that continue to inhabit ancestral habitats, primarily in the Iberian Peninsula. They have mixed with a lineage that has spread to northern latitudes from an unknown glacial refugium and is now found in a much broader spectrum of habitats. Insights into the history of the species and the finescale distribution of genetic diversity provide the basis for full exploitation of A. thaliana natural variation through integration of genomes and epigenomes with molecular and non-molecular phenotypes

    A map of human genome variation from population-scale sequencing

    Get PDF
    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research

    An integrated map of genetic variation from 1,092 human genomes

    Get PDF
    By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations

    A global reference for human genetic variation

    Get PDF
    The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies

    In silico karyotyping of chromosomally polymorphic malaria mosquitoes in the Anopheles gambiae complex

    Get PDF
    Chromosomal inversion polymorphisms play an important role in adaptation to environmental heterogeneities. For mosquito species in the Anopheles gambiae complex that are significant vectors of human malaria, paracentric inversion polymorphisms are abundant and are associated with ecologically and epidemiologically important phenotypes. Improved understanding of these traits relies on determining mosquito karyotype, which currently depends upon laborious cytogenetic methods whose application is limited both by the requirement for specialized expertise and for properly preserved adult females at specific gonotrophic stages. To overcome this limitation, we developed sets of tag single nucleotide polymorphisms (SNPs) inside inversions whose biallelic genotype is strongly correlated with inversion genotype. We leveraged 1,347 fully sequenced An. gambiae and Anopheles coluzzii genomes in the Ag1000G database of natural variation. Beginning with principal components analysis (PCA) of population samples, applied to windows of the genome containing individual chromosomal rearrangements, we classified samples into three inversion genotypes, distinguishing homozygous inverted and homozygous uninverted groups by inclusion of the small subset of specimens in Ag1000G that are associated with cytogenetic metadata. We then assessed the correlation between candidate tag SNP genotypes and PCA-based inversion genotypes in our training sets, selecting those candidates with >80% agreement. Our initial tests both in held-back validation samples from Ag1000G and in data independent of Ag1000G suggest that when used for in silico inversion genotyping of sequenced mosquitoes, these tags perform better than traditional cytogenetics, even for specimens where only a small subset of the tag SNPs can be successfully ascertained

    A bi-objective feature selection algorithm for large omics datasets

    Get PDF
    Special Issue: Fourth special issue on knowledge discovery and business intelligence.Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the bi-objective version of the algorithm Logical Analysis of Inconsistent Data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The bi-objective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.The authors would like to thank the FCT support UID/Multi/04046/2013. This work used the EGI, European Grid Infrastructure, with the support of the IBERGRID, Iberian Grid Infrastructure, and INCD (Portugal).info:eu-repo/semantics/publishedVersio

    Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways

    Get PDF
    Depression is a polygenic trait that causes extensive periods of disability. Previous genetic studies have identified common risk variants which have progressively increased in number with increasing sample sizes of the respective studies. Here, we conduct a genome-wide association study in 322,580 UK Biobank participants for three depression-related phenotypes: broad depression, probable major depressive disorder (MDD), and International Classification of Diseases (ICD, version 9 or 10)-coded MDD. We identify 17 independent loci that are significantly associated (P < 5 × 10−8) across the three phenotypes. The direction of effect of these loci is consistently replicated in an independent sample, with 14 loci likely representing novel findings. Gene sets are enriched in excitatory neurotransmission, mechanosensory behaviour, post synapse, neuron spine and dendrite functions. Our findings suggest that broad depression is the most tractable UK Biobank phenotype for discovering genes and gene sets that further our understanding of the biological pathways underlying depression

    Sequence data of six unusual alleles at SE33 and D1S1656 STR Loci

    Get PDF
    When profiling a reference dataset of 500 DNA samples for the population of Saudi Arabia, using the GlobalFiler® PCR amplification kit, six unusual alleles were detected. At the SE33 locus, four novel alleles were found: 2, 14.3, 20.3, and 38; two alleles, at the D1S1656 locus: 7 and 8, had been previously reported, but no published sequence data was available. The D1S1656 alleles were sequenced using ForenSeq™ DNA Signature Prep with the MiSeq FGx System (Illumina, USA). As the SE33 is not reported by available Massively Parallel Sequencing (MPS) systems, samples that exhibited the unreported alleles were sequenced using BigDye™ Terminator v3.1 Cycle Sequencing Kit. Here we present the sequence and structure of the previously uncharacterized alleles

    A genomic data viewer for iPad

    Get PDF
    The Integrative Genomics Viewer (IGV) for iPad, based on the popular IGV application for desktop and laptop computers, supports researchers who wish to take advantage of the mobility of today’s tablet computers to view genomic data and present findings to colleagues

    A Data-driven Investigation of Relationships Between Bipolar Psychotic Symptoms and Schizophrenia Genome-wide Significant Genetic Loci

    Get PDF
    The etiologies of bipolar disorder (BD) and schizophrenia include a large number of common risk alleles, many of which are shared across the disorders. BD is clinically heterogeneous and it has been postulated that the pattern of symptoms is in part determined by the particular risk alleles carried, and in particular, that risk alleles also confer liability to schizophrenia influence psychotic symptoms in those with BD. To investigate links between psychotic symptoms in BD and schizophrenia risk alleles we employed a data-driven approach in a genotyped and deeply phenotyped sample of subjects with BD. We used sparse canonical correlation analysis (sCCA) (Witten, Tibshirani, & Hastie, ) to analyze 30 psychotic symptoms, assessed with the OPerational CRITeria checklist, and 82 independent genome-wide significant single nucleotide polymorphisms (SNPs) identified by the Schizophrenia Working group of the Psychiatric Genomics Consortium for which we had data in our BD sample (3,903 subjects). As a secondary analysis, we applied sCCA to larger groups of SNPs, and also to groups of symptoms defined according to a published factor analyses of schizophrenia. sCCA analysis based on individual psychotic symptoms revealed a significant association (p = .033), with the largest weights attributed to a variant on chromosome 3 (rs11411529), chr3:180594593, build 37) and delusions of influence, bizarre behavior and grandiose delusions. sCCA analysis using the same set of SNPs supported association with the same SNP and the group of symptoms defined "factor 3" (p = .012). A significant association was also observed to the "factor 3" phenotype group when we included a greater number of SNPs that were less stringently associated with schizophrenia; although other SNPs contributed to the significant multivariate association result, the greatest weight remained assigned to rs11411529. Our results suggest that the canonical correlation is a useful tool to explore phenotype-genotype relationships. To the best of our knowledge, this is the first study to apply this approach to complex, polygenic psychiatric traits. The sparse canonical correlation approach offers the potential to include a larger number of fine-grained systematic descriptors, and to include genetic markers associated with other disorders that are genetically correlated with BD
    • …
    corecore