1,893 research outputs found

    The landscape of human STR variation

    Get PDF
    Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome’s representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.American Society for Engineering Education. National Defense Science and Engineering Graduate Fellowshi

    Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project

    No full text
    We present biallelic SNVs called from 2,548 samples across 26 populationsfrom the 1000 Genomes Project, called directly on GRCh38. We believethis will be a useful reference resource for those using GRCh38,representing an improvement over the “lift-overs” of the 1000 GenomesProject data that have been available to date and providing a resourcenecessary for the full adoption of GRCh38 by the community. Here, wedescribe how the call set was created and provide benchmarking datadescribing how our call set compares to that produced by the final phase ofthe 1000 Genomes Project on GRCh37

    HapZipper: sharing HapMap populations just got easier

    Get PDF
    The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing H ap Z ipper , a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip , bzip2 and lzma . We demonstrate the usefulness of H ap Z ipper by compressing HapMap 3 populations to <5% of their original sizes. H ap Z ipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz

    In vitro characterization of mitochondrial function and structure in rat and human cells with a deficiency of the NADH:ubiquinone oxidoreductase Ndufc2 subunit

    Get PDF
    Ndufc2, a subunit of the NADH:ubiquinone oxidoreductase, plays a key role in the assembly and activity of complex I within the mitochondrial OXPHOS chain. Its deficiency has been shown to be involved in diabetes, cancer and stroke. To improve our knowledge on the mechanisms underlying the increased disease risk due to Ndufc2 reduction, we performed the present in vitro study aimed at the fine characterization of the derangements in mitochondrial structure and function consequent to Ndufc2 deficiency. We found that both fibroblasts obtained from skin of heterozygous Ndufc2 knock-out rat model showed marked mitochondrial dysfunction and PBMC obtained from subjects homozygous for the TT genotype of the rs11237379/NDUFC2 variant, previously shown to associate with reduced gene expression, demonstrated increased generation of reactive oxygen species and mitochondrial damage. The latter was associated with increased oxidative stress and significant ultrastructural impairment of mitochondrial morphology with a loss of internal cristae. In both models the exposure to stress stimuli, such as high-NaCl concentration or LPS, exacerbated the mitochondrial damage and dysfunction. Resveratrol significantly counteracted the ROS generation. These findings provide additional insights on the role of an altered pattern of mitochondrial structure-function as a cause of human diseases. In particular, they contribute to underscore a potential genetic risk factor for cardiovascular diseases, including stroke

    Sequence data of six unusual alleles at SE33 and D1S1656 STR Loci

    Get PDF
    When profiling a reference dataset of 500 DNA samples for the population of Saudi Arabia, using the GlobalFiler® PCR amplification kit, six unusual alleles were detected. At the SE33 locus, four novel alleles were found: 2, 14.3, 20.3, and 38; two alleles, at the D1S1656 locus: 7 and 8, had been previously reported, but no published sequence data was available. The D1S1656 alleles were sequenced using ForenSeq™ DNA Signature Prep with the MiSeq FGx System (Illumina, USA). As the SE33 is not reported by available Massively Parallel Sequencing (MPS) systems, samples that exhibited the unreported alleles were sequenced using BigDye™ Terminator v3.1 Cycle Sequencing Kit. Here we present the sequence and structure of the previously uncharacterized alleles

    Mining data from 1000 genomes to identify the causal variant in regions under positive selection

    Get PDF
    The human genome contains hundreds of regions in which the patterns of genetic variation indicate recent positive natural selection, yet for most of these the underlying gene and the advantageous mutation remain unknown. We recently reported the development of a method, Composite of Multiple Signals (CMS), that combines tests for multiple signals of natural selection and increases resolution by up to 100-fold

    The effects of species ortholog and SNP variation on receptors for free fatty acids

    Get PDF
    Although it is widely assumed that species orthologs of hormone responsive G protein-coupled receptors will be activated by the same endogenously produced ligand(s), variation in potency, particularly in cases where more than one receptor responds to the same hormone, can result in challenges in defining the contribution of individual receptors in different species. This can create considerably greater issues when using synthetic chemical ligands and, in some cases, may result in a complete lack of efficacy of such a ligand when used in animal models of pathophysiology. In man, the concept that distinct responses of individuals to medicines may reflect differences in the ability of such drugs to bind to or activate single nucleotide polymorphism variants of receptors is more established as a concept but, in many cases, clear links between such variants that are associated with disease phenotypes and substantial differences in receptor ligand pharmacology have been more difficult to obtain. Herein, we consider each of these issues for the group of receptors, FFA1-FFA4, defined to be activated by free fatty acids of varying chain length which, based on their production by one tissue or location and action in distinct locations, have been suggested to possess characteristics of ‘hormones’

    A Geometric Framework for Evaluating Rare Variant Tests of Association

    Get PDF
    The wave of next‐generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/97460/1/gepi21722.pd

    An integrated map of genetic variation from 1,092 human genomes

    Get PDF
    By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations
    corecore