7 research outputs found

    A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    Get PDF
    Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF < 1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    TIGER: the translational human pancreatic islet genotype tissue-expression resource

    No full text
    Background and aims: The scarcity of human islets preparations from organ donors available and their scattering across research labs, limits the understanding of the genomic and regulatory landscape of human islets and type 2 diabetes (T2D). The Horizon 2020 T2DSystems Consortium set out to gather genomic, transcriptomic and epigenomic datasets from a large number of human pancreatic islet samples from several laboratories and make the data publicly available.Materials and methods: We collected RNA-seq and genotyping data from 495 human islet samples and performed harmonization, quality control, genotype phasing and imputation. We integrated a) T2D association from genome-wide association studies (GWAS) identified in large meta-analyses or included in the GWAS Catalog, b) variant annotation and characterization through Variant Effect Predictor and Gnomad, c) epigenomic marks from islet DNA-methylation sites, chromatin accessibility and CHiP-seq profiles, d) annotation from Gene Ontology, lncRNAs and islet regulome, e) gene expression from normalised islet RNA-seq counts, microarrays and the Genotype-Tissue Expression database, and f) computed expression quantitative loci (eQTL) and allelic specific expression (ASE) and created the largest regulatory variation database from human pancreatic islets.Results: We developed TIGER, a publicly accessible database (http://tiger.bsc.es) provided with a genome browser to ensure the comprehensive data integration. The platform encloses tools for visualizing, querying, and downloading human islet data. TIGER facilitates follow-up by providing genetic and molecular findings related to T2D pathophysiology with a gene or a variant summary, eQTL and ASE results, associations with T2D and other related traits or diseases, genomic context information such as the islet chromatin landscape and direct access to other genomic databases.Conclusion: The comprehensive collation in TIGER of genomic, transcriptomic and epigenetic human islet datasets, and the integration with T2D GWAS and regulatory variation, represents a formidable resource to interrogate the molecular etiology of beta-cell failure in T2D.info:eu-repo/semantics/publishe

    The Role of Genetic Polymorphism in the Formation of Arterial Hypertension, Type 2 Diabetes and their Comorbidity

    No full text

    Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: A multi-ethnic meta-analysis of 45,891 individuals

    Get PDF
    Circulating levels of adiponectin, a hormone produced predominantly by adipocytes, are highly heritable and are inversely associated with type 2 diabetes mellitus (T2D) and other metabolic traits. We conducted a meta-analysis of genome-wide association studies in 39,883 individuals of European ancestry to identify genes associated with metabolic disease. We identified 8 novel loci associated with adiponectin levels and confirmed 2 previously reported loci (P = 4.5×10−8- 1.2 ×10−43). Using a novel method to combine data across ethnicities (N = 4,232 African Americans, N = 1,776 Asians, and N = 29,347 Europeans), we identified two additional novel loci. Expression analyses of 436 human adipocyte samples revealed that mRNA levels of 18 genes at candidate regions were associated with adiponectin concentrations after accounting for multiple testing (p<3×10−4). We next developed a multi-SNP genotypic risk score to test the association of adiponectin decreasing risk alleles on metabolic traits and diseases using consortia-level meta-analytic data. This risk score was associated with increased risk of T2D (p = 4.3×10−3, n = 22,044), increased triglycerides (p = 2.6×10−14, n = 93,440), increased waist-to-hip ratio (p = 1.8×10−5, n = 77,167), increased glucose two hours post oral glucose tolerance testing (p = 4.4×10−3, n = 15,234), increased fasting insulin (p = 0.015, n = 48,238), but with lower in HDL- cholesterol concentrations (p = 4.5×10−13, n = 96,748) and decreased BMI (p = 1.4×10−4, n = 121,335). These findings identify novel genetic determinants of adiponectin levels, which, taken together, influence risk of T2D and markers of insulin resistance

    Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes

    Get PDF
    OBJECTIVE - Proinsulin is a precursor of mature insulin and C-peptide. Higher circulating proinsulin levels are associated with impaired b-cell function, raised glucose levels, insulin resistance, and type 2 diabetes (T2D). Studies of the insulin processing pathway could provide new insights about T2D pathophysiology. RESEARCH DESIGN AND METHODS - We have conducted a meta-analysis of genome-wide association tests of ;2.5 million genotyped or imputed single nucleotide polymorphisms (SNPs) and fasting proinsulin levels in 10,701 nondiabetic adults of European ancestry, with follow-up of 23 loci in up to 16,378 individuals, using additive genetic models adjusted for age, sex, fasting insulin, and study-specific covariates. RESULTS - Nine SNPs at eight loci were associated with proinsulin levels (P < 5 × 10-8). Two loci (LARP6 and SGSM2) have not been previously related to metabolic traits, one (MADD) has been associated with fasting glucose, one (PCSK1) has been implicated in obesity, and four (TCF7L2, SLC30A8, VPS13C/ C2CD4A/B, and ARAP1, formerly CENTD2) increase T2D risk. The proinsulin-raising allele of ARAP1 was associated with a lower fasting glucose (P = 1.7 3 10-4), improved b-cell function (P = 1.1 × 10-5), and lower risk of T2D (odds ratio 0.88; P = 7.8 × 10-6). Notably, PCSK1 encodes the protein prohormone convertase 1/3, the first enzyme in the insulin processing pathway. A genotype score composed of the nine proinsulin-raising alleles was not associated with coronary disease in two large case-control datasets. CONCLUSIONS - We have identified nine genetic variants associated with fasting proinsulin. Our findings illuminate the biology underlying glucose homeostasis and T2D development in humans and argue against a direct role of proinsulin in coronary artery disease pathogenesis

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science
    corecore