47 research outputs found

    Explorative visual analytics on interval-based genomic data and their metadata

    Get PDF
    Background: With the wide-spreading of public repositories of NGS processed data, the availability of user-friendly and effective tools for data exploration, analysis and visualization is becoming very relevant. These tools enable interactive analytics, an exploratory approach for the seamless "sense-making" of data through on-the-fly integration of analysis and visualization phases, suggested not only for evaluating processing results, but also for designing and adapting NGS data analysis pipelines. Results: This paper presents abstractions for supporting the early analysis of NGS processed data and their implementation in an associated tool, named GenoMetric Space Explorer (GeMSE). This tool serves the needs of the GenoMetric Query Language, an innovative cloud-based system for computing complex queries over heterogeneous processed data. It can also be used starting from any text files in standard BED, BroadPeak, NarrowPeak, GTF, or general tab-delimited format, containing numerical features of genomic regions; metadata can be provided as text files in tab-delimited attribute-value format. GeMSE allows interactive analytics, consisting of on-the-fly cycling among steps of data exploration, analysis and visualization that help biologists and bioinformaticians in making sense of heterogeneous genomic datasets. By means of an explorative interaction support, users can trace past activities and quickly recover their results, seamlessly going backward and forward in the analysis steps and comparative visualizations of heatmaps. Conclusions: GeMSE effective application and practical usefulness is demonstrated through significant use cases of biological interest. GeMSE is available at http://www.bioinformatics.deib.polimi.it/GeMSE/ , and its source code is available at https://github.com/Genometric/GeMSEunder GPLv3 open-source license

    Genome-wide analysis identifies novel susceptibility loci for myocardial infarction

    Get PDF
    AIMS: While most patients with myocardial infarction (MI) have underlying coronary atherosclerosis, not all patients with coronary artery disease (CAD) develop MI. We sought to address the hypothesis that some of the genetic factors which establish atherosclerosis may be distinct from those that predispose to vulnerable plaques and thrombus formation. METHODS AND RESULTS: We carried out a genome-wide association study for MI in the UK Biobank (n∼472 000), followed by a meta-analysis with summary statistics from the CARDIoGRAMplusC4D Consortium (n∼167 000). Multiple independent replication analyses and functional approaches were used to prioritize loci and evaluate positional candidate genes. Eight novel regions were identified for MI at the genome wide significance level, of which effect sizes at six loci were more robust for MI than for CAD without the presence of MI. Confirmatory evidence for association of a locus on chromosome 1p21.3 harbouring choline-like transporter 3 (SLC44A3) with MI in the context of CAD, but not with coronary atherosclerosis itself, was obtained in Biobank Japan (n∼165 000) and 16 independent angiography-based cohorts (n∼27 000). Follow-up analyses did not reveal association of the SLC44A3 locus with CAD risk factors, biomarkers of coagulation, other thrombotic diseases, or plasma levels of a broad array of metabolites, including choline, trimethylamine N-oxide, and betaine. However, aortic expression of SLC44A3 was increased in carriers of the MI risk allele at chromosome 1p21.3, increased in ischaemic (vs. non-diseased) coronary arteries, up-regulated in human aortic endothelial cells treated with interleukin-1β (vs. vehicle), and associated with smooth muscle cell migration in vitro. CONCLUSIONS: A large-scale analysis comprising ∼831 000 subjects revealed novel genetic determinants of MI and implicated SLC44A3 in the pathophysiology of vulnerable plaques

    Global Analysis of the Impact of Environmental Perturbation on cis-Regulation of Gene Expression

    Get PDF
    Genetic variants altering cis-regulation of normal gene expression (cis-eQTLs) have been extensively mapped in human cells and tissues, but the extent by which controlled, environmental perturbation influences cis-eQTLs is unclear. We carried out large-scale induction experiments using primary human bone cells derived from unrelated donors of Swedish origin treated with 18 different stimuli (7 treatments and 2 controls, each assessed at 2 time points). The treatments with the largest impact on the transcriptome, verified on two independent expression arrays, included BMP-2 (t = 2h), dexamethasone (DEX) (t = 24h), and PGE2 (t = 24h). Using these treatments and control, we performed expression profiling for 18,144 RefSeq transcripts on biological replicates of the complete study cohort of 113 individuals (ntotal = 782) and combined it with genome-wide SNP-genotyping data in order to map treatment-specific cis-eQTLs (defined as SNPs located within the gene ±250 kb). We found that 93% of cis-eQTLs at 1% FDR were observed in at least one additional treatment, and in fact, on average, only 1.4% of the cis-eQTLs were considered as treatment-specific at high confidence. The relative invariability of cis-regulation following perturbation was reiterated independently by genome-wide allelic expression tests where only a small proportion of variance could be attributed to treatment. Treatment-specific cis-regulatory effects were, however, 2- to 6-fold more abundant among differently expressed genes upon treatment. We further followed-up and validated the DEX–specific cis-regulation of the MYO6 and TNC loci and found top cis-regulatory variants located 180 kb and 250 kb upstream of the transcription start sites, respectively. Our results suggest that, as opposed to tissue-specificity of cis-eQTLs, the interactions between cellular environment and cis-variants are relatively rare (∼1.5%), but that detection of such specific interactions can be achieved by a combination of functional genomic approaches as described here

    Treatment- and Population-Dependent Activity Patterns of Behavioral and Expression QTLs

    Get PDF
    Genetic control of gene expression and higher-order phenotypes is almost invariably dependent on environment and experimental conditions. We use two families of recombinant inbred strains of mice (LXS and BXD) to study treatment- and genotype-dependent control of hippocampal gene expression and behavioral phenotypes. We analyzed responses to all combinations of two experimental perturbations, ethanol and restraint stress, in both families, allowing for comparisons across 8 combinations of treatment and population. We introduce the concept of QTL activity patterns to characterize how associations between genomic loci and traits vary across treatments. We identified several significant behavioral QTLs and many expression QTLs (eQTLs). The behavioral QTLs are highly dependent on treatment and population. We classified eQTLs into three groups: cis-eQTLs (expression variation that maps to within 5 Mb of the cognate gene), syntenic trans-eQTLs (the gene and the QTL are on the same chromosome but not within 5 Mb), and non-syntenic trans-eQTLs (the gene and the QTL are on different chromosomes). We found that most non-syntenic trans-eQTLs were treatment-specific whereas both classes of syntenic eQTLs were more conserved across treatments. We also found there was a correlation between regions along the genome enriched for eQTLs and SNPs that were conserved across the LXS and BXD families. Genes with eQTLs that co-localized with the behavioral QTLs and displayed similar QTL activity patterns were identified as potential candidate genes associated with the phenotypes, yielding identification of novel genes as well as genes that have been previously associated with responses to ethanol

    Differential endothelial cell gene expression by African Americans versus Caucasian Americans: a possible contribution to health disparity in vascular disease and cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Health disparities and the high prevalence of cardiovascular disease continue to be perplexing worldwide health challenges. This study addresses the possibility that genetic differences affecting the biology of the vascular endothelium could be a factor contributing to the increased burden of cardiovascular disease and cancer among African Americans (AA) compared to Caucasian Americans (CA).</p> <p>Methods</p> <p>From self-identified, healthy, 20 to 29-year-old AA (n = 21) and CA (n = 17), we established cultures of blood outgrowth endothelial cells (BOEC) and applied microarray profiling. BOEC have never been exposed to <it>in vivo </it>influences, and their gene expression reflects culture conditions (meticulously controlled) and donor genetics. Significance Analysis of Microarray identified differential expression of single genes. Gene Set Enrichment Analysis examined expression of pre-determined gene sets that survey nine biological systems relevant to endothelial biology.</p> <p>Results</p> <p>At the highly stringent threshold of False Discovery Rate (FDR) = 0, 31 single genes were differentially expressed in AA. <it>PSPH </it>exhibited the greatest fold-change (AA > CA), but this was entirely accounted for by a homolog (<it>PSPHL</it>) hidden within the <it>PSPH </it>probe set. Among other significantly different genes were: for AA > CA, <it>SOS1, AMFR, FGFR3; and for AA < CA, ARVCF, BIN3, EIF4B. </it>Many more (221 transcripts for 204 genes) were differentially expressed at the less stringent threshold of FDR <.05. Using the biological systems approach, we identified shear response biology as being significantly different for AA versus CA, showing an apparent tonic increase of expression (AA > CA) for 46/157 genes within that system.</p> <p>Conclusions</p> <p>Many of the genes implicated here have substantial roles in endothelial biology. Shear stress response, a critical regulator of endothelial function and vascular homeostasis, may be different between AA and CA. These results potentially have direct implications for the role of endothelial cells in vascular disease (hypertension, stroke) and cancer (via angiogenesis). Also, they are consistent with our over-arching hypothesis that genetic influences stemming from ancestral continent-of-origin could impact upon endothelial cell biology and thereby contribute to disparity of vascular-related disease burden among AA. The method used here could be productively employed to bridge the gap between information from structural genomics (for example, disease association) and cell function and pathophysiology.</p

    Biological heterogeneity in idiopathic pulmonary arterial hypertension identified through unsupervised transcriptomic profiling of whole blood

    Get PDF
    Idiopathic pulmonary arterial hypertension (IPAH) is a rare but fatal disease diagnosed by right heart catheterisation and the exclusion of other forms of pulmonary arterial hypertension, producing a heterogeneous population with varied treatment response. Here we show unsupervised machine learning identification of three major patient subgroups that account for 92% of the cohort, each with unique whole blood transcriptomic and clinical feature signatures. These subgroups are associated with poor, moderate, and good prognosis. The poor prognosis subgroup is associated with upregulation of the ALAS2 and downregulation of several immunoglobulin genes, while the good prognosis subgroup is defined by upregulation of the bone morphogenetic protein signalling regulator NOG, and the C/C variant of HLA-DPA1/DPB1 (independently associated with survival). These findings independently validated provide evidence for the existence of 3 major subgroups (endophenotypes) within the IPAH classification, could improve risk stratification and provide molecular insights into the pathogenesis of IPAH

    Overview of GeCo: A Project for Exploring and Integrating Signals from the Genome

    Get PDF
    Next Generation Sequencing is a 10-year old technology for reading the DNA, capable of producing massive amounts of genomic data - in turn, reshaping genomic computing. In particular, tertiary data analysis is concerned with the integration of heterogeneous regions of the genome; this is an emerging and increasingly important problem of genomic computing, because regions carry important signals and the creation of new biological or clinical knowledge requires the integration of these signals into meaningful messages. We specifically focus on how the GeCo project is contributing to tertiary data analysis, by overviewing the main results of the project so far and by describing its future scenarios
    corecore