76 research outputs found

    Two-dimensional enrichment analysis for mining high-level imaging genetic associations

    Get PDF
    Enrichment analysis has been widely applied in the genome-wide association studies (GWAS), where gene sets corresponding to biological pathways are examined for significant associations with a phenotype to help increase statistical power and improve biological interpretation. In this work, we expand the scope of enrichment analysis into brain imaging genetics, an emerging field that studies how genetic variation influences brain structure and function measured by neuroimaging quantitative traits (QT). Given the high dimensionality of both imaging and genetic data, we propose to study Imaging Genetic Enrichment Analysis (IGEA), a new enrichment analysis paradigm that jointly considers meaningful gene sets (GS) and brain circuits (BC) and examines whether any given GS-BC pair is enriched in a list of gene-QT findings. Using gene expression data from Allen Human Brain Atlas and imaging genetics data from Alzheimer's Disease Neuroimaging Initiative as test beds, we present an IGEA framework and conduct a proof-of-concept study. This empirical study identifies 12 significant high level two dimensional imaging genetics modules. Many of these modules are relevant to a variety of neurobiological pathways or neurodegenerative diseases, showing the promise of the proposal framework for providing insight into the mechanism of complex diseases

    Dissecting complex transcriptional responses using pathway-level scores based on prior information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The genomewide pattern of changes in mRNA expression measured using DNA microarrays is typically a complex superposition of the response of multiple regulatory pathways to changes in the environment of the cells. The use of prior information, either about the function of the protein encoded by each gene, or about the physical interactions between regulatory factors and the sequences controlling its expression, has emerged as a powerful approach for dissecting complex transcriptional responses.</p> <p>Results</p> <p>We review two different approaches for combining the noisy expression levels of multiple individual genes into robust pathway-level differential expression scores. The first is based on a comparison between the distribution of expression levels of genes within a predefined gene set and those of all other genes in the genome. The second starts from an estimate of the strength of genomewide regulatory network connectivities based on sequence information or direct measurements of protein-DNA interactions, and uses regression analysis to estimate the activity of gene regulatory pathways. The statistical methods used are explained in detail.</p> <p>Conclusion</p> <p>By avoiding the thresholding of individual genes, pathway-level analysis of differential expression based on prior information can be considerably more sensitive to subtle changes in gene expression than gene-level analysis. The methods are technically straightforward and yield results that are easily interpretable, both biologically and statistically.</p

    Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges

    Get PDF
    Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base–driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis

    PhenoFam-gene set enrichment analysis through protein structural information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the current technological advances in high-throughput biology, the necessity to develop tools that help to analyse the massive amount of data being generated is evident. A powerful method of inspecting large-scale data sets is gene set enrichment analysis (GSEA) and investigation of protein structural features can guide determining the function of individual genes. However, a convenient tool that combines these two features to aid in high-throughput data analysis has not been developed yet. In order to fill this niche, we developed the user-friendly, web-based application, PhenoFam.</p> <p>Results</p> <p>PhenoFam performs gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Our tool is designed to analyse complete sets of results from quantitative high-throughput studies (gene expression microarrays, functional RNAi screens, <it>etc</it>.) without prior pre-filtering or hits-selection steps. PhenoFam utilizes Ensembl databases to link a list of user-provided identifiers with protein features from the InterPro database, and assesses whether results associated with individual domains differ significantly from the overall population. To demonstrate the utility of PhenoFam we analysed a genome-wide RNA interference screen and discovered a novel function of plexins containing the cytoplasmic RasGAP domain. Furthermore, a PhenoFam analysis of breast cancer gene expression profiles revealed a link between breast carcinoma and altered expression of PX domain containing proteins.</p> <p>Conclusions</p> <p>PhenoFam provides a user-friendly, easily accessible web interface to perform GSEA based on high-throughput data sets and structural-functional protein information, and therefore aids in functional annotation of genes.</p

    MicroRNA-Integrated and Network-Embedded Gene Selection with Diffusion Distance

    Get PDF
    Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways

    Comparisons of seven algorithms for pathway analysis using the WTCCC Crohn's Disease dataset

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Though rooted in genomic expression studies, pathway analysis for genome-wide association studies (GWAS) has gained increasing popularity, since it has the potential to discover hidden disease pathogenic mechanisms by combining statistical methods with biological knowledge. Generally, algorithms or programs proposed recently can be categorized by different types of input data, null hypothesis or counts of analysis stages. Due to complexity caused by SNP, gene and pathway relationships, re-sampling strategies like permutation are always utilized to derive an empirical distribution for test statistics for evaluating the significance of candidate pathways. However, evaluation of these algorithms on real GWAS datasets and real biological pathway databases needs to be addressed before we apply them widely with confidence.</p> <p>Findings</p> <p>Two algorithms which use summary statistics from GWAS as input were implemented in KGG, a novel and user-friendly software tool for GWAS pathway analysis. Comparisons of these two algorithms as well as the other five selected algorithms were conducted by analyzing the WTCCC Crohn's Disease dataset utilizing the MsigDB canonical pathways. As a result of using permutation to obtain empirical p-value, most of these methods could control Type I error rate well, although some are conservative. However, the methods varied greatly in terms of power and running time, with the PLINK truncated set-based test being the most powerful and KGG being the fastest.</p> <p>Conclusions</p> <p>Raw data-based algorithms, such as those implemented in PLINK, are preferable for GWAS pathway analysis as long as computational capacity is available. It may be worthwhile to apply two or more pathway analysis algorithms on the same GWAS dataset, since the methods differ greatly in their outputs and might provide complementary findings for the studied complex disease.</p

    Purine Nucleoside Phosphorylase mediated molecular chemotherapy and conventional chemotherapy: A tangible union against chemoresistant cancer

    Get PDF
    Background Late stage Ovarian Cancer is essentially incurable primarily due to late diagnosis and its inherent heterogeneity. Single agent treatments are inadequate and generally lead to severe side effects at therapeutic doses. It is crucial to develop clinically relevant novel combination regimens involving synergistic modalities that target a wider repertoire of cells and lead to lowered individual doses. Stemming from this premise, this is the first report of two- and three-way synergies between Adenovirus-mediated Purine Nucleoside Phosphorylase based gene directed enzyme prodrug therapy (PNP-GDEPT), docetaxel and/or carboplatin in multidrug-resistant ovarian cancer cells. Methods The effects of PNP-GDEPT on different cellular processes were determined using Shotgun Proteomics analyses. The in vitro cell growth inhibition in differentially treated drug resistant human ovarian cancer cell lines was established using a cell-viability assay. The extent of synergy, additivity, or antagonism between treatments was evaluated using CalcuSyn statistical analyses. The involvement of apoptosis and implicated proteins in effects of different treatments was established using flow cytometry based detection of M30 (an early marker of apoptosis), cell cycle analyses and finally western blot based analyses. Results Efficacy of the trimodal treatment was significantly greater than that achieved with bimodal- or individual treatments with potential for 10-50 fold dose reduction compared to that required for individual treatments. Of note was the marked enhancement in apoptosis that specifically accompanied the combinations that included PNP-GDEPT and accordingly correlated with a shift in the expression of anti- and pro-apoptotic proteins. PNP-GDEPT mediated enhancement of apoptosis was reinforced by cell cycle analyses. Proteomic analyses of PNP-GDEPT treated cells indicated a dowregulation of proteins involved in oncogenesis or cancer drug resistance in treated cells with accompanying upregulation of apoptotic- and tumour- suppressor proteins. Conclusion Inclusion of PNP-GDEPT in regular chemotherapy regimens can lead to significant enhancement of the cancer cell susceptibility to the combined treatment. Overall, these data will underpin the development of regimens that can benefit patients with late stage ovarian cancer leading to significantly improved efficacy and increased quality of life

    Functional Analysis: Evaluation of Response Intensities - Tailoring ANOVA for Lists of Expression Subsets

    Get PDF
    Background: Microarray data is frequently used to characterize the expression profile of a whole genome and to compare the characteristics of that genome under several conditions. Geneset analysis methods have been described previously to analyze the expression values of several genes related by known biological criteria (metabolic pathway, pathology signature, co-regulation by a common factor, etc.) at the same time and the cost of these methods allows for the use of more values to help discover the underlying biological mechanisms. Results: As several methods assume different null hypotheses, we propose to reformulate the main question that biologists seek to answer. To determine which genesets are associated with expression values that differ between two experiments, we focused on three ad hoc criteria: expression levels, the direction of individual gene expression changes (up or down regulation), and correlations between genes. We introduce the FAERI methodology, tailored from a two-way ANOVA to examine these criteria. The significance of the results was evaluated according to the self-contained null hypothesis, using label sampling or by inferring the null distribution from normally distributed random data. Evaluations performed on simulated data revealed that FAERI outperforms currently available methods for each type of set tested. We then applied the FAERI method to analyze three real-world datasets on hypoxia response. FAERI was able to detect more genesets than other methodologies, and the genesets selected were coherent with current knowledge of cellular response to hypoxia. Moreover, the genesets selected by FAERI were confirmed when the analysis was repeated on two additional related datasets. Conclusions: The expression values of genesets are associated with several biological effects. The underlying mathematical structure of the genesets allows for analysis of data from several genes at the same time. Focusing on expression levels, the direction of the expression changes, and correlations, we showed that two-step data reduction allowed us to significantly improve the performance of geneset analysis using a modified two-way ANOVA procedure, and to detect genesets that current methods fail to detect

    High-level integration of murine intestinal transcriptomics data highlights the importance of the complement system in mucosal homeostasis.

    Get PDF
    BACKGROUND: The mammalian intestine is a complex biological system that exhibits functional plasticity in its response to diverse stimuli to maintain homeostasis. To improve our understanding of this plasticity, we performed a high-level data integration of 14 whole-genome transcriptomics datasets from samples of intestinal mouse mucosa. We used the tool Centrality based Pathway Analysis (CePa), along with information from the Reactome database. RESULTS: The results show an integrated response of the mouse intestinal mucosa to challenges with agents introduced orally that were expected to perturb homeostasis. We observed that a common set of pathways respond to different stimuli, of which the most reactive was the Regulation of Complement Cascade pathway. Altered expression of the Regulation of Complement Cascade pathway was verified in mouse organoids challenged with different stimuli in vitro. CONCLUSIONS: Results of the integrated transcriptomics analysis and data driven experiment suggest an important role of epithelial production of complement and host complement defence factors in the maintenance of homeostasis
    corecore