974 research outputs found

    An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis

    Get PDF
    Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC) is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.Comment: 54 pages (19 pages main text; 11 Figures; 26 pages of supplementary information). Revised after critical reviews. Accepted for Publication in PLoS ON

    Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets

    Get PDF
    Gene annotation databases (compendiums maintained by the scientific community that describe the biological functions performed by individual genes) are commonly used to evaluate the functional properties of experimentally derived gene sets. Overlap statistics, such as Fisher's Exact Test (FET), are often employed to assess these associations, but don't account for non-uniformity in the number of genes annotated to individual functions or the number of functions associated with individual genes. We find FET is strongly biased toward over-estimating overlap significance if a gene set has an unusually high number of annotations. To correct for these biases, we develop Annotation Enrichment Analysis (AEA), which properly accounts for the non-uniformity of annotations. We show that AEA is able to identify biologically meaningful functional enrichments that are obscured by numerous false-positive enrichment scores in FET, and we therefore suggest it be used to more accurately assess the biological properties of gene sets

    COMPUTATIONAL TOOLS FOR THE DYNAMIC CATEGORIZATION AND AUGMENTED UTILIZATION OF THE GENE ONTOLOGY

    Get PDF
    Ontologies provide an organization of language, in the form of a network or graph, which is amenable to computational analysis while remaining human-readable. Although they are used in a variety of disciplines, ontologies in the biomedical field, such as Gene Ontology, are of interest for their role in organizing terminology used to describe—among other concepts—the functions, locations, and processes of genes and gene-products. Due to the consistency and level of automation that ontologies provide for such annotations, methods for finding enriched biological terminology from a set of differentially identified genes in a tissue or cell sample have been developed to aid in the elucidation of disease pathology and unknown biochemical pathways. However, despite their immense utility, biomedical ontologies have significant limitations and caveats. One major issue is that gene annotation enrichment analyses often result in many redundant, individually enriched ontological terms that are highly specific and weakly justified by statistical significance. These large sets of weakly enriched terms are difficult to interpret without manually sorting into appropriate functional or descriptive categories. Also, relationships that organize the terminology within these ontologies do not contain descriptions of semantic scoping or scaling among terms. Therefore, there exists some ambiguity, which complicates the automation of categorizing terms to improve interpretability. We emphasize that existing methods enable the danger of producing incorrect mappings to categories as a result of these ambiguities, unless simplified and incomplete versions of these ontologies are used which omit problematic relations. Such ambiguities could have a significant impact on term categorization, as we have calculated upper boundary estimates of potential false categorizations as high as 121,579 for the misinterpretation of a single scoping relation, has_part, which accounts for approximately 18% of the total possible mappings between terms in the Gene Ontology. However, the omission of problematic relationships results in a significant loss of retrievable information. In the Gene Ontology, this accounts for a 6% reduction for the omission of a single relation. However, this percentage should increase drastically when considering all relations in an ontology. To address these issues, we have developed methods which categorize individual ontology terms into broad, biologically-related concepts to improve the interpretability and statistical significance of gene-annotation enrichment studies, meanwhile addressing the lack of semantic scoping and scaling descriptions among ontological relationships so that annotation enrichment analyses can be performed across a more complete representation of the ontological graph. We show that, when compared to similar term categorization methods, our method produces categorizations that match hand-curated ones with similar or better accuracy, while not requiring the user to compile lists of individual ontology term IDs. Furthermore, our handling of problematic relations produces a more complete representation of ontological information from a scoping perspective, and we demonstrate instances where medically-relevant terms--and by extension putative gene targets--are identified in our annotation enrichment results that would be otherwise missed when using traditional methods. Additionally, we observed a marginal, yet consistent improvement of statistical power in enrichment results when our methods were used, compared to traditional enrichment analyses that utilize ontological ancestors. Finally, using scalable and reproducible data workflow pipelines, we have applied our methods to several genomic, transcriptomic, and proteomic collaborative projects

    Nuclear Organization in Breast Cancer: A Dissertation

    Get PDF
    The nuclear matrix (NM) is a fibrogranular network of ribonucleoproteins upon which transcriptional complexes and regulatory genomic sequences are organized. A hallmark of cancer is the disorganization of nuclear architecture; however, the extent to which the NM is involved in malignancy is not well studied. The RUNX1 and RUNX2 proteins form complexes within the NM to promote hematopoiesis and osteoblastogenesis, respectively at the transcriptional level. RUNX1 and RUNX2 are both expressed in breast cancer cells (BrCCs); however, their genome-wide BrCC functions are unknown. RUNX1 and RUNX2 activate many tumor suppressor pathways in blood and bone lineages, respectively, including attenuation of protein synthesis and cell growth via suppression of ribosomal RNA (rRNA) transcription, which appears contrary to Runx-expression in highly proliferative BrCCs. To define roles for RUNX1 and RUNX2 in BrCC phenotype, we examined the involvement of RUNX1 and RUNX2 in rRNA transcription and generated a genome-wide model for RUNX1 and RUNX2-binding and transcriptional regulation. To validate gene expression patterns identified in our screen, we developed a Real-Time qPCR primer design program, which allows rapid, high-throughput design of primer pairs (FoxPrimer). In BrCCs, RUNX1 and RUNX2 regulate genes that promote invasiveness and do not affect rRNA transcription, protein synthesis, or cell growth. We have characterized in vitro functions of Runx proteins in BrCCs; however, the relationships between Runx expression and diagnostic/prognostic markers of breast cancer (BrCa) in patients are not well studied. Immunohistochemical detection of RUNX1 and RUNX2 in BrCa tissue microarrays reveals RUNX1 expression is associated with early, smaller tumors that are ER+ (estrogen receptor), HER2+, p53-, and correlated with androgen receptor (AR) expression; RUNX2 expression is associated with late-stage, larger tumors that are HER2+. These results show that the functions and expression patterns of NM-associated RUNX1 and RUNX2 are context-sensitive, which suggests potential disease-specific roles. Two functionally disparate genomic sequence types bind to the NM: matrix associated regions (MARs) are functionally associated with transcriptional repression and scaffold associated regions (SARs) are functionally associated with actively expressed genes. It is unknown whether malignant nuclear disorganization affects the functions of MARs/SARs in BrCC. We have refined a method to isolate nuclear matrix associated DNA (NM-DNA) from a structurally preserved NM and applied this protocol to normal mammary epithelial cells and BrCCs. To define transcriptional functions for NM-DNA, we developed a computational algorithm (PeaksToGenes), which statistically tests the associations of experimentally-defined NM-DNA regions and ChIP-seq-defined positional enrichment of several histone marks with transcriptome-wide gene expression data. In normal mammary epithelial cells, NM-DNA is enriched in both MARs and SARs, and the positional enrichment patterns of MARs and SARs are strongly associated with gene expression patterns, suggesting functional roles. In contrast, the BrCCs are significantly enriched in the silencing mark H3K27me3, and the NM-DNA is enriched in MARs and depleted of SARs. The MARs/SARs in the BrCCs are only weakly associated with gene expression patterns, suggesting that loss of normal DNA-matrix associations accompanies the disease state. Our results show that structural preservation of the in situ NM allows isolation of both MARs and SARs, and further demonstrate that in a disorganized, cancerous nucleus, normal transcriptional functions of NM-DNA are disrupted. Our studies on nuclear organization in BrCC, show that the disorganized phenotype of the cancer cell nucleus is accompanied by deregulated transcriptional functions of two constituents of the NM. These results reinforce the role of the NM as an important structure-function component of gene expression regulation

    Processing genome-wide association studies within a repository of heterogeneous genomic datasets

    Get PDF
    Background Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions. Results To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multisample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals. Conclusions As a result of our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows

    Systems Biology Approach to Identify Gene Network Signatures for Colorectal Cancer

    Get PDF
    In this work, we integrated prior knowledge from gene signatures and protein interactions with gene set enrichment analysis (GSEA), and gene/protein network modeling together to identify gene network signatures from gene expression microarray data. We demonstrated how to apply this approach into discovering gene network signatures for colorectal cancer (CRC) from microarray datasets. First, we used GSEA to analyze the microarray data through enriching differential genes in different CRC-related gene sets from two publicly available up-to-date gene set databases – Molecular Signatures Database (MSigDB) and Gene Signatures Database (GeneSigDB). Second, we compared the enriched gene sets through enrichment score, false-discovery rate, and nominal p-value. Third, we constructed an integrated protein–protein interaction (PPI) network through connecting these enriched genes by high-quality interactions from a human annotated and predicted protein interaction database, with a confidence score labeled for each interaction. Finally, we mapped differential gene expressions onto the constructed network to build a comprehensive network model containing visualized transcriptome and proteome data. The results show that although MSigDB has more CRC-relevant gene sets than GeneSigDB, the integrated PPI network connecting the enriched genes from both MSigDB and GeneSigDB can provide a more complete view for discovering gene network signatures. We also found several important sub-network signatures for CRC, such as TP53 sub-network, PCNA sub-network, and IL8 sub-network, corresponding to apoptosis, DNA repair, and immune response, respectively

    Post-transcriptional knowledge in pathway analysis increases the accuracy of phenotypes classification

    Get PDF
    Motivation: Prediction of phenotypes from high-dimensional data is a crucial task in precision biology and medicine. Many technologies employ genomic biomarkers to characterize phenotypes. However, such elements are not sufficient to explain the underlying biology. To improve this, pathway analysis techniques have been proposed. Nevertheless, such methods have shown lack of accuracy in phenotypes classification. Results: Here we propose a novel methodology called MITHrIL (Mirna enrIched paTHway Impact anaLysis) for the analysis of signaling pathways, which has built on top of the work of Tarca et al., 2009. MITHrIL extends pathways by adding missing regulatory elements, such as microRNAs, and their interactions with genes. The method takes as input the expression values of genes and/or microRNAs and returns a list of pathways sorted according to their deregulation degree, together with the corresponding statistical significance (p-values). Our analysis shows that MITHrIL outperforms its competitors even in the worst case. In addition, our method is able to correctly classify sets of tumor samples drawn from TCGA. Availability: MITHrIL is freely available at the following URL: http://alpha.dmi.unict.it/mithril
    corecore