24 research outputs found

    Genomics Portals: integrative web-platform for mining genomics data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems.</p> <p>Results</p> <p>Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis.</p> <p>Conclusion</p> <p>The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at <url>http://GenomicsPortals.org</url>.</p

    Evaluating the harmonisation potential of diverse cohort datasets

    Get PDF
    Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential tool for increasing the efficiency of data discovery, particularly across multiple datasets is data harmonisation.A set of 124 variables, identified as being of broad interest to neurodegeneration, were harmonised using the C-Surv data model. Harmonisation strategies used were simple calibration, algorithmic transformation and standardisation to the Z-distribution. Widely used data conventions, optimised for inclusiveness rather than aetiological precision, were used as harmonisation rules. The harmonisation scheme was applied to data from four diverse population cohorts.Of the 120 variables that were found in the datasets, correspondence between the harmonised data schema and cohort-specific data models was complete or close for 111 (93%). For the remainder, harmonisation was possible with a marginal a loss of granularity.Although harmonisation is not an exact science, sufficient comparability across datasets was achieved to enable data discovery with relatively little loss of informativeness. This provides a basis for further work extending harmonisation to a larger variable list, applying the harmonisation to further datasets, and incentivising the development of data discovery tools

    Genome-Wide Signatures of Transcription Factor Activity: Connecting Transcription Factors, Disease, and Small Molecules

    Get PDF
    <div><p>Identifying transcription factors (TF) involved in producing a genome-wide transcriptional profile is an essential step in building mechanistic model that can explain observed gene expression data. We developed a statistical framework for constructing genome-wide signatures of TF activity, and for using such signatures in the analysis of gene expression data produced by complex transcriptional regulatory programs. Our framework integrates ChIP-seq data and appropriately matched gene expression profiles to identify True REGulatory (TREG) TF-gene interactions. It provides genome-wide quantification of the likelihood of regulatory TF-gene interaction that can be used to either identify regulated genes, or as genome-wide signature of TF activity. To effectively use ChIP-seq data, we introduce a novel statistical model that integrates information from all binding ā€œpeaksā€ within 2 Mb window around a gene's transcription start site (TSS), and provides gene-level binding scores and probabilities of regulatory interaction. In the second step we integrate these binding scores and regulatory probabilities with gene expression data to assess the likelihood of True REGulatory (TREG) TF-gene interactions. We demonstrate the advantages of TREG framework in identifying genes regulated by two TFs with widely different distribution of functional binding events (ERĪ± and E2f1). We also show that TREG signatures of TF activity vastly improve our ability to detect involvement of ERĪ± in producing complex diseases-related transcriptional profiles. Through a large study of disease-related transcriptional signatures and transcriptional signatures of drug activity, we demonstrate that increase in statistical power associated with the use of TREG signatures makes the crucial difference in identifying key targets for treatment, and drugs to use for treatment. All methods are implemented in an open-source R package <i>treg</i>. The package also contains all data used in the analysis including 494 TREG binding profiles based on ENCODE ChIP-seq data. The <i>treg</i> package can be downloaded at <a href="http://GenomicsPortals.org" target="_blank">http://GenomicsPortals.org</a>.</p></div

    Relative statistical significance of the association between ChIP-seq and differential gene expression data for different window sizes.

    No full text
    <p>The ratio of āˆ’log<sub>10</sub>(p-value of enrichment) of differentially expressed genes (FDR<0.1) among genes with high MPI scores, and āˆ’log<sub>10</sub>(p-value of enrichment) of differentially expressed genes among genes with high TREG binding scores. The ratios related to E2f1 ChIP-seq data and E2 differential gene expression profile are represented by the blue line. The ratios related to ERĪ± ChIP-seq data and are represented by the red line. Ratios smaller than 1 indicate higher significance of enrichment when using TREG scores, as opposed to maximum peak height within the given window.</p

    Distinctive functional roles of ERĪ± and E2F1 targets.

    No full text
    <p>Top 10 enriched gene lists associated with Gene Ontology terms using the TREG signatures for enrichment analysis. <b>A</b>) Gene lists enriched with ERĪ± regulated genes only. <b>B</b>) Gene lists enriched with E2F1 regulated genes only.</p

    P-values for TREG concordance analysis between TREG binding profiles (E2f1 and ERĪ±) and differential gene expression profiles (E2 and E2+CHX).

    No full text
    <p>P-values for TREG concordance analysis between TREG binding profiles (E2f1 and ERĪ±) and differential gene expression profiles (E2 and E2+CHX).</p
    corecore