35 research outputs found
Population genomics of domestic and wild yeasts
The natural genetics of an organism is determined by the distribution of sequences of its genome. Here we present one- to four-fold, with some deeper, coverage of the genome sequences of over seventy isolates of the domesticated baker's yeast, _Saccharomyces cerevisiae_, and its closest relative, the wild _S. paradoxus_, which has never been associated with human activity. These were collected from numerous geographic locations and sources (including wild, clinical, baking, wine, laboratory and food spoilage). These sequences provide an unprecedented view of the population structure, natural (and artificial) selection and genome evolution in these species. Variation in gene content, SNPs, indels, copy numbers and transposable elements provide insights into the evolution of different lineages. Phenotypic variation broadly correlates with global genome-wide phylogenetic relationships however there is no correlation with source. _S. paradoxus_ populations are well delineated along geographic boundaries while the variation among worldwide _S. cerevisiae_ isolates show less differentiation and is comparable to a single _S. paradoxus_ population. Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of _S. cerevisiae_ shows a few well defined geographically isolated lineages and many different mosaics of these lineages, supporting the notion that human influence provided the opportunity for outbreeding and production of new combinations of pre-existing variation
Fine mapping the KLK3 locus on chromosome 19q13.33 associated with prostate cancer susceptibility and PSA levels
Measurements of serum prostate-specific antigen (PSA) protein levels form the basis for a widely used test to screen men for prostate cancer. Germline variants in the gene that encodes the PSA protein (KLK3) have been shown to be associated with both serum PSA levels and prostate cancer. Based on a resequencing analysis of a 56 kb region on chromosome 19q13.33, centered on the KLK3 gene, we fine mapped this locus by genotyping tag SNPs in 3,522 prostate cancer cases and 3,338 controls from five case–control studies. We did not observe a strong association with the KLK3 variant, reported in previous studies to confer risk for prostate cancer (rs2735839; P = 0.20) but did observe three highly correlated SNPs (rs17632542, rs62113212 and rs62113214) associated with prostate cancer [P = 3.41 × 10−4, per-allele trend odds ratio (OR) = 0.77, 95% CI = 0.67–0.89]. The signal was apparent only for nonaggressive prostate cancer cases with Gleason score <7 and disease stage <III (P = 4.72 × 10−5, per-allele trend OR = 0.68, 95% CI = 0.57–0.82) and not for advanced cases with Gleason score >8 or stage ≥III (P = 0.31, per-allele trend OR = 1.12, 95% CI = 0.90–1.40). One of the three highly correlated SNPs, rs17632542, introduces a non-synonymous amino acid change in the KLK3 protein with a predicted benign or neutral functional impact. Baseline PSA levels were 43.7% higher in control subjects with no minor alleles (1.61 ng/ml, 95% CI = 1.49–1.72) than in those with one or more minor alleles at any one of the three SNPs (1.12 ng/ml, 95% CI = 0.96–1.28) (P = 9.70 × 10−5). Together our results suggest that germline KLK3 variants could influence the diagnosis of nonaggressive prostate cancer by influencing the likelihood of biopsy
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel
Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants
SNAPSHOT USA 2019 : a coordinated national camera trap survey of the United States
This article is protected by copyright. All rights reserved.With the accelerating pace of global change, it is imperative that we obtain rapid inventories of the status and distribution of wildlife for ecological inferences and conservation planning. To address this challenge, we launched the SNAPSHOT USA project, a collaborative survey of terrestrial wildlife populations using camera traps across the United States. For our first annual survey, we compiled data across all 50 states during a 14-week period (17 August - 24 November of 2019). We sampled wildlife at 1509 camera trap sites from 110 camera trap arrays covering 12 different ecoregions across four development zones. This effort resulted in 166,036 unique detections of 83 species of mammals and 17 species of birds. All images were processed through the Smithsonian's eMammal camera trap data repository and included an expert review phase to ensure taxonomic accuracy of data, resulting in each picture being reviewed at least twice. The results represent a timely and standardized camera trap survey of the USA. All of the 2019 survey data are made available herein. We are currently repeating surveys in fall 2020, opening up the opportunity to other institutions and cooperators to expand coverage of all the urban-wild gradients and ecophysiographic regions of the country. Future data will be available as the database is updated at eMammal.si.edu/snapshot-usa, as well as future data paper submissions. These data will be useful for local and macroecological research including the examination of community assembly, effects of environmental and anthropogenic landscape variables, effects of fragmentation and extinction debt dynamics, as well as species-specific population dynamics and conservation action plans. There are no copyright restrictions; please cite this paper when using the data for publication.Publisher PDFPeer reviewe
New genetic loci link adipose and insulin biology to body fat distribution.
Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 × 10(-8)). In total, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes.
OBJECTIVE: Proinsulin is a precursor of mature insulin and C-peptide. Higher circulating proinsulin levels are associated with impaired β-cell function, raised glucose levels, insulin resistance, and type 2 diabetes (T2D). Studies of the insulin processing pathway could provide new insights about T2D pathophysiology. RESEARCH DESIGN AND METHODS: We have conducted a meta-analysis of genome-wide association tests of ∼2.5 million genotyped or imputed single nucleotide polymorphisms (SNPs) and fasting proinsulin levels in 10,701 nondiabetic adults of European ancestry, with follow-up of 23 loci in up to 16,378 individuals, using additive genetic models adjusted for age, sex, fasting insulin, and study-specific covariates. RESULTS: Nine SNPs at eight loci were associated with proinsulin levels (P < 5 × 10(-8)). Two loci (LARP6 and SGSM2) have not been previously related to metabolic traits, one (MADD) has been associated with fasting glucose, one (PCSK1) has been implicated in obesity, and four (TCF7L2, SLC30A8, VPS13C/C2CD4A/B, and ARAP1, formerly CENTD2) increase T2D risk. The proinsulin-raising allele of ARAP1 was associated with a lower fasting glucose (P = 1.7 × 10(-4)), improved β-cell function (P = 1.1 × 10(-5)), and lower risk of T2D (odds ratio 0.88; P = 7.8 × 10(-6)). Notably, PCSK1 encodes the protein prohormone convertase 1/3, the first enzyme in the insulin processing pathway. A genotype score composed of the nine proinsulin-raising alleles was not associated with coronary disease in two large case-control datasets. CONCLUSIONS: We have identified nine genetic variants associated with fasting proinsulin. Our findings illuminate the biology underlying glucose homeostasis and T2D development in humans and argue against a direct role of proinsulin in coronary artery disease pathogenesis
Whole-genome sequence-based analysis of thyroid function
Tiina Paunio on työryhmän UK10K Consortium jäsen.Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF >= 1%) associated with TSH and FT4 (N = 16,335). For TSH, we identify a novel variant in SYN2 (MAF = 23.5%, P = 6.15 x 10(-9)) and a new independent variant in PDE8B (MAF = 10.4%, P = 5.94 x 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/ SLC25A52 (MAF = 3.2%, P = 1.27 x 10(-9)) tagging a rare TTR variant (MAF = 0.4%, P = 2.14 x 10(-11)). All common variants explain >= 20% of the variance in TSH and FT4. Analysis of rare variants (MAFPeer reviewe
Structural analysis and unique molecular recognition properties of a Bauhinia forficata lectin that inhibits cancer cell growth
Lectins have been used at length for basic research and clinical applications. New insights into the molecular recognition properties enhance our basic understanding of carbohydrate-protein interactions and aid in the design/development of new lectins. In this study, we used a combination of cell-based assays, glycan microarrays, and X-ray crystallography to evaluate the structure and function of the recombinant Bauhinia forficata lectin (BfL). The lectin was shown to be cytostatic for several cancer cell lines included in the NCI-60 panelin particular, it inhibited growth of melanoma cancer cells (LOX IMVI) by over 95%. BfL is dimeric in solution and highly specific for binding of oligosaccharides and glycopeptides with terminal N-acetylgalactosamine (GalNAc). BfL was found to have especially strong binding (apparent K-d = 0.5-1.0 nM) to the tumor-associated Tn antigen. High-resolution crystal structures were determined for the ligand-free lectin, as well as for its complexes with three Tn glycopeptides, globotetraose, and the blood group A antigen. Extensive analysis of the eight crystal structures and comparison to structures of related lectins revealed several unique features of GalNAc recognition. Of special note, the carboxylate group of Glu126, lining the glycan-binding pocket, forms H-bonds with both the N-acetyl of GalNAc and the peptide amido group of Tn antigens. Stabilization provided by Glu126 is described here for the first time for any GalNAc-specific lectin. Taken together, the results provide new insights into the molecular recognition of carbohydrates and provide a structural understanding that will enable rational engineering of BfL for a variety of applications. Database Structural data are available in the PDB under the accession numbers 5T50, 5T52, 5T55, 5T54, 5T5L, 5T5J, 5T5P, and 5T5O.National Institutes of Health (NIH), National Cancer Institute, Center for Cancer ResearchFundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP)FAPESP (PD-BEPE)U.S. Department of Energy, Office of Science, Office of Basic Energy SciencesNCI, Macromol Crystallog Lab, Ctr Canc Res, Frederick, MD 21702 USANCI, Biol Chem Lab, Ctr Canc Res, Frederick, MD 21701 USAUniv Fed Sao Paulo, Escola Paulista Med, Sao Paulo, SP, BrazilUniv Fed Sao Paulo, Escola Paulista Med, Sao Paulo, SP, BrazilFAPESP: 2009/53766-5FAPESP: 2012/06366-4FAPESP: 2014/22649-1FAPESP (PD-BEPE): 2014/22649-1U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences: W-31-109-Eng-38Web of Scienc