85 research outputs found

    Identifying genetic mechanisms of cardiometabolic traits and diseases using quantitative sequence data

    Get PDF
    Cardiometabolic diseases are a worldwide health concern. Genetics studies have identified hundreds of genetic loci associated with these diseases and other cardiometabolic risk factors, but gaps remain in the understanding of the biological mechanisms responsible for these associations. Sequence data from quantitative experiments, such as DNase-seq and ChIP-seq, that identify genomic regions regulating gene transcription are helping to fill these gaps. Allelic imbalance at heterozygous sites, or enrichment of one allele, in this data can indicate allelic differences in transcriptional regulation, but reference mapping biases present in sequence alignments prevent accurate allelic imbalance detection. We describe a pipeline, AA-ALIGNER, that removes mapping biases at heterozygous sites and increases allelic imbalance detection accuracy in samples with any amount of genotype data available. When complete genotype information is not available, AA-ALIGNER more accurately detects allelic imbalance at imputed heterozygous sites than heterozygous sites predicted using the sequence data. At predicted heterozygous sites, imbalance detection is more accurate at common variants than other variants. Additionally, imbalance detection with AA-ALIGNER is robust to a variety of experimental and analytical parameters. Using AA-ALIGNER, we detected evidence of allelic imbalance at 22,414 heterozygous sites in data from samples with relevance to cardiometabolic disease and risk factors. We have identified protein binding motifs for one of the imbalanced proteins at a majority of these sites, and evidence that imbalance in data for this protein is associated with imbalance in data for other proteins. Additionally, a subset of sites of allelic imbalance are located at expression quantitative trait loci and/or genome-wide association loci for cardiometabolic traits and diseases. These sites are strong candidates to be studied experimentally and we report experimental evidence of allelic differences in protein binding, enhancer activity and/or the regulation of specific genes for a handful of these sites. Using allelic imbalance detection, we have detected differences in protein binding across the genome providing valuable insight into mechanisms of transcriptional regulation. Focusing on cardiometabolic diseases and risk factors, this work demonstrates the utility of allelic imbalance detection in studying genetic effects on the regulation of gene transcription at complex disease- and trait-associated loci.Doctor of Philosoph

    HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data

    Get PDF
    BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important insights into autoimmune disorders, disease susceptibility, oncological immunotherapy, regenerative medicine, transplant rejection, and toxicogenomics. Traditional approaches to HLA typing are low throughput, target only a few genes, are labor intensive and costly, or require specialized protocols. RNA sequencing promises a relatively inexpensive, high-throughput solution for HLA calling across all genes, with the bonus of complete transcriptome information and widespread availability of historical data. Existing tools have been limited in their ability to accurately and comprehensively call HLA genes from RNA-seq data. RESULTS: We created HLAProfiler ( https://github.com/ExpressionAnalysis/HLAProfiler ), a k-mer profile-based method for HLA calling in RNA-seq data which can identify rare and common HLA alleles with > 99% accuracy at two-field precision in both biological and simulated data. For 68% of novel alleles not present in the reference database, HLAProfiler can correctly identify the two-field precision or exact coding sequence, a significant advance over existing algorithms. CONCLUSIONS: HLAProfiler allows for accurate HLA calls in RNA-seq data, reliably expanding the utility of these data in HLA-related research and enabling advances across a broad range of disciplines. Additionally, by using the observed data to identify potential novel alleles and update partial alleles, HLAProfiler will facilitate further improvements to the existing database of reference HLA alleles. HLAProfiler is available at https://expressionanalysis.github.io/HLAProfiler/

    New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk

    Get PDF
    To increase our understanding of the genetic basis of adiposity and its links to cardiometabolic disease risk, we conducted a genome-wide association meta-analysis of body fat percentage (BF%) in up to 100,716 individuals. Twelve loci reached genome-wide significance (P\u3c5 × 10−8), of which eight were previously associated with increased overall adiposity (BMI, BF%) and four (in or near COBLL1/GRB14, IGF2BP1, PLA2G6, CRTC1) were novel associations with BF%. Seven loci showed a larger effect on BF% than on BMI, suggestive of a primary association with adiposity, while five loci showed larger effects on BMI than on BF%, suggesting association with both fat and lean mass. In particular, the loci more strongly associated with BF% showed distinct cross-phenotype association signatures with a range of cardiometabolic traits revealing new insights in the link between adiposity and disease risk

    Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci

    Get PDF
    Abstract Background Genetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging. Quantitative sequence-based experiments such as ChIP-seq and DNase-seq can detect sites of allelic imbalance where alleles contribute disproportionately to the overall signal suggesting allelic differences in regulatory activity. Methods We created an allelic imbalance detection pipeline, AA-ALIGNER, to remove reference mapping biases influencing allelic imbalance detection and evaluate accuracy of allelic imbalance predictions in the absence of complete genotype data. Using the sequence aligner, GSNAP, and varying amounts of genotype information to remove mapping biases we investigated the accuracy of allelic imbalance detection (binomial test) in CREB1 ChIP-seq reads from the GM12878 cell line. Additionally we thoroughly evaluated the influence of experimental and analytical parameters on imbalance detection. Results Compared to imbalances identified using complete genotypes, using imputed partial sample genotypes, AA-ALIGNER detected >95 % of imbalances with >90 % accuracy. AA-ALIGNER performed nearly as well using common variants when genotypes were unknown. In contrast, predicting additional heterozygous sites and imbalances using the sequence data led to >50 % false positive rates. We evaluated effects of experimental data characteristics and key analytical parameter settings on imbalance detection. Overall, total base coverage and signal dispersion across the genome most affected our ability to detect imbalances, while parameters such as imbalance significance, imputation quality thresholds, and alignment mismatches had little effect. To assess the biological relevance of imbalance predictions, we used electrophoretic mobility shift assays to functionally test for predicted allelic differences in CREB1 binding in the GM12878 lymphoblast cell line. Six of nine tested variants exhibited allelic differences in binding. Two of these variants, rs2382818 and rs713875, are located within inflammatory bowel disease-associated loci. Conclusions AA-ALIGNER accurately detects allelic imbalance in quantitative sequence data using partial genotypes or common variants filling a critical methodological gap in these analyses, as full genotypes are rarely available. Importantly, we demonstrate how experimental and analytical features impact imbalance detection providing guidance for similar future studies

    Secretory products from PC-3 and MCF-7 tumor cell lines upregulate osteopontin in MC3T3-E1 cells

    Full text link
    Tumor cells frequently have pronounced effects on the skeleton including bone destruction, bone pain, hypercalcemia, and depletion of bone marrow cells. Despite the serious sequelae associated with skeletal metastasis, the mechanisms by which tumor cells alter bone homeostasis remain largely unknown. In this study, we tested the hypothesis that the disruption of bone homeostasis by tumor cells is due in part to the ability of tumor cells to upregulate osteopontin (OPN) mRNA in osteoblasts. Conditioned media were collected from tumor cells that elicit either osteolytic (MCF-7, PC-3) or osteoblastic responses (LNCaP) in animal models and their effects on OPN gene expression were compared using an osteoblast precursor cell line, MC3T3-E1 cells. Secretory products from osteolytic but not osteoblastic tumor cell lines were demonstrated to upregulate OPN in osteoblasts while inhibiting osteoblast proliferation and differentiation. Signal transduction studies revealed that regulation of OPN was dependent on both protein kinase C (PKC) and the mitogen-activated protein (MAP) kinase cascade. These results suggest that the upregulation of OPN may play a key role in the development of osteolytic lesions. Furthermore, these results suggest that drugs that prevent activation of the MAP kinase pathway may be efficacious in the treatment of osteolytic metastases. J. Cell. Biochem. 78:607–616, 2000. © 2000 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/34901/1/10_ftp.pd

    Multiple Hepatic Regulatory Variants at the GALNT2 GWAS Locus Associated with High-Density Lipoprotein Cholesterol

    Get PDF
    Genome-wide association studies (GWASs) have identified more than 150 loci associated with blood lipid and cholesterol levels; however, the functional and molecular mechanisms for many associations are unknown. We examined the functional regulatory effects of candidate variants at the GALNT2 locus associated with high-density lipoprotein cholesterol (HDL-C). Fine-mapping and conditional analyses in the METSIM study identified a single locus harboring 25 noncoding variants (r2 > 0.7 with the lead GWAS variants) strongly associated with total cholesterol in medium-sized HDL (e.g., rs17315646, p = 3.5 × 10−12). We used luciferase reporter assays in HepG2 cells to test all 25 variants for allelic differences in regulatory enhancer activity. rs2281721 showed allelic differences in transcriptional activity (75-fold [T] versus 27-fold [C] more than the empty-vector control), as did a separate 780-bp segment containing rs4846913, rs2144300, and rs6143660 (49-fold [AT– haplotype] versus 16-fold [CC+ haplotype] more). Using electrophoretic mobility shift assays, we observed differential CEBPB binding to rs4846913, and we confirmed this binding in a native chromatin context by performing chromatin-immunoprecipitation (ChIP) assays in HepG2 and Huh-7 cell lines of differing genotypes. Additionally, sequence reads in HepG2 DNase-I-hypersensitivity and CEBPB ChIP-seq signals spanning rs4846913 showed significant allelic imbalance. Allelic-expression-imbalance assays performed with RNA from primary human hepatocyte samples and expression-quantitative-trait-locus (eQTL) data in human subcutaneous adipose tissue samples confirmed that alleles associated with increased HDL-C are associated with a modest increase in GALNT2 expression. Together, these data suggest that at least rs4846913 and rs2281721 play key roles in influencing GALNT2 expression at this HDL-C locus

    Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion

    Get PDF
    Insulin secretion plays a critical role in glucose homeostasis, and failure to secrete sufficient insulin is a hallmark of type 2 diabetes. Genome-wide association studies (GWAS) have identified loci contributing to insulin processing and secretion1,2; however, a substantial fraction of the genetic contribution remains undefined. To examine low-frequency (minor allele frequency (MAF) 0.5% to 5%) and rare (MAF<0.5%) nonsynonymous variants, we analyzed exome array data in 8,229 non-diabetic Finnish males. We identified low-frequency coding variants associated with fasting proinsulin levels at the SGSM2 and MADD GWAS loci and three novel genes with low-frequency variants associated with fasting proinsulin or insulinogenic index: TBC1D30, KANK1, and PAM. We also demonstrate that the interpretation of single-variant and gene-based tests needs to consider the effects of noncoding SNPs nearby and megabases (Mb) away. This study demonstrates that exome array genotyping is a valuable approach to identify low-frequency variants that contribute to complex traits
    corecore