Search CORE

Online Research @ Cardiff

IUPUIScholarWorks

Repository@Nottingham

Archivio della Ricerca - Università di Pisa

Institutional Repository Universiteit Antwerpen

PGS: a tool for association study of high-dimensional microRNA expression data with repeated measures

Author: Andrea A. Baccarelli
Baccarelli
Baek
Benjamini
Byun
Chen
Friedman
Guo
Hamm
Hecker
Henderson
Hosack
Hou
Hou
Hou
John
Justin B. Starren
Kanehisa
Kozomara
Lei Liu
Lifang Hou
Paraskevopoulou
Rakyan
Reinsbach
Selbach
Singh
Storey
Storey
Vlachos
Wang
Wei Zhang
Yi Li
Yinan Zheng
Zander
Zeger
Zhe Fei
Zou
Publication venue: Collection of Biostatistics Research Archive
Publication date: 03/06/2014
Field of study

Motivation: MicroRNAs (miRNAs) are short single-stranded non-coding molecules that usually function as negative regulators to silence or suppress gene expression. Due to interested in the dynamic nature of the miRNA and reduced microarray and sequencing costs, a growing number of researchers are now measuring high-dimensional miRNAs expression data using repeated or multiple measures in which each individual has more than one sample collected and measured over time. However, the commonly used site-by-site multiple testing may impair the value of repeated or multiple measures data by ignoring the inherent dependent structure, which lead to problems including underpowered results after multiple comparison correction using false discovery rate (FDR) estimation and less biologically meaningful results. Hence, new methods are needed to tackle these issues. Results: We propose a penalized regression model incorporating grid search method (PGS), for analyzing association study of high-dimensional microRNA expression data with repeated measures. The development of this analytical framework was motivated by a real-world miRNA dataset. Comparisons between PGS and the site-by-site testing revealed that PGS provided smaller phenotype prediction errors and higher enrichment of phenotype-related biological pathways than the site-by-site testing. Simulation study showed that PGS provided more accurate estimates and higher sensitivity than site-by-site testing with comparable specificities. Availability: R source code for PGS algorithm, implementation example, and simulation study are available for download at https://github.com/feizhe/PGS

Collection Of Biostatistics Research Archive

Assessment of protein set coherence using functional annotations

Author: Carazo Jose M
Chagoyen Monica
Pascual-Montano Alberto
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

12 pages, 5 figures. -- PMID: 18937846 [PubMed].-- PMCID: PMC2588600.-- Additional information available: File 1: Coherence score and significance measures of random sets.- File 2: Functional analysis of 'Module 39' obtained by Pu et al. [37] using various approaches.[Background] Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set.[Results] In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation.[Conclusions] We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating ‘functional modules’ obtained from computational analysis of protein-protein interaction networks.Matlab code and supplementary data are available at: http://www.cnb.csic.es/~monica/coherence/This work has been partially funded by the Spanish grants BIO2007-67150-C03-02, S-Gen- 0166/2006, CYTED-505PI0058, TIN2005-5619, PR27/05-13964-BSCH. APM acknowledges the support of the Spanish Ramón y Cajal program.Peer reviewe

Digital.CSIC

WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data

Author: Cohen Jonathan C
Hobbs Helen H
Horton Jay D
Stephens Robert M
Yi Ming
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. RESULT: WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. CONCLUSION: This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at

From genes to functional classes in the study of biological systems

Author: Al-Shahrour Fátima
Arbiza Leonardo
Dopazo Hernán
Dopazo Joaquín
Huerta-Cepas Jaime
Montaner David
Mínguez Pablo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: With the popularisation of high-throughput techniques, the need for procedures that help in the biological interpretation of results has increased enormously. Recently, new procedures inspired in systems biology criteria have started to be developed. RESULTS: Here we present FatiScan, a web-based program which implements a threshold-independent test for the functional interpretation of large-scale experiments that does not depend on the pre-selection of genes based on the multiple application of independent tests to each gene. The test implemented aims to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes. In addition, the test does not depend on the type of the data used for obtaining significance values, and consequently different types of biologically informative terms (gene ontology, pathways, functional motifs, transcription factor binding sites or regulatory sites from CisRed) can be applied to different classes of genome-scale studies. We exemplify its application in microarray gene expression, evolution and interactomics. CONCLUSION: Methods for gene set enrichment which, in addition, are independent from the original data and experimental design constitute a promising alternative for the functional profiling of genome-scale experiments. A web server that performs the test described and other similar ones can be found at:

Genome-wide expression profiling and bioinformatics analysis of diurnally regulated genes in the mouse prefrontal cortex

Author
Publication venue: BioMed Central
Publication date: 20/11/2007
Field of study

Eigengene networks for studying the relationships between co-expression modules

Author: A Barabási
A Ghazalpour
A Li
A Yip
B Zhang
D Reiss
E Ravasz
E Segal
G Dennis
H Hotelling
H Wei
J Dong
JM Stuart
L Hartwell
M Oldham
O Alter
P D'haeseleer
P Khaitovich
P Langfelder
Peter Langfelder
R Albert
RA Fisher
RI Jennrich
S Carter
S Horvath
Steve Horvath
T Fuller
WS Wu
X Xu
X Zhou
Y Ye
Z Bar-Joseph
Publication venue: BioMed Central
Publication date: 01/11/2007
Field of study

Abstract Background There is evidence that genes and their protein products are organized into functional modules according to cellular processes and pathways. Gene co-expression networks have been used to describe the relationships between gene transcripts. Ample literature exists on how to detect biologically meaningful modules in networks but there is a need for methods that allow one to study the relationships between modules. Results We show that network methods can also be used to describe the relationships between co-expression modules and present the following methodology. First, we describe several methods for detecting modules that are shared by two or more networks (referred to as consensus modules). We represent the gene expression profiles of each module by an eigengene. Second, we propose a method for constructing an eigengene network, where the edges are undirected but maintain information on the sign of the co-expression information. Third, we propose methods for differential eigengene network analysis that allow one to assess the preservation of network properties across different data sets. We illustrate the value of eigengene networks in studying the relationships between consensus modules in human and chimpanzee brains; the relationships between consensus modules in brain, muscle, liver, and adipose mouse tissues; and the relationships between male-female mouse consensus modules and clinical traits. In some applications, we find that module eigengenes can be organized into higher level clusters which we refer to as meta-modules. Conclusion Eigengene networks can be effective and biologically meaningful tools for studying the relationships between modules of a gene co-expression network. The proposed methods may reveal a higher order organization of the transcriptome. R software tutorials, the data, and supplementary material can be found at the following webpage: <url>http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/EigengeneNetwork</url>.</p

A stochastic model for identifying differential gene pair co-expression patterns in prostate cancer progression

Author: Fu Xu Ping
Guo Feng Hua
Han Xiao Tian
Huang Yan
Li Yao
Mao Yu Min
Mo Wen Juan
Xie Yi
Yang Guang Yuan
Zhang Ji Gang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The identification of gene differential co-expression patterns between cancer stages is a newly developing method to reveal the underlying molecular mechanisms of carcinogenesis. Most researches of this subject lack an algorithm useful for performing a statistical significance assessment involving cancer progression. Lacking this specific algorithm is apparently absent in identifying precise gene pairs correlating to cancer progression. Results In this investigation we studied gene pair co-expression change by using a stochastic process model for approximating the underlying dynamic procedure of the co-expression change during cancer progression. Also, we presented a novel analytical method named 'Stochastic process model for Identifying differentially co-expressed Gene pair' (SIG method). This method has been applied to two well known prostate cancer data sets: hormone sensitive versus hormone resistant, and healthy versus cancerous. From these data sets, 428,582 gene pairs and 303,992 gene pairs were identified respectively. Afterwards, we used two different current statistical methods to the same data sets, which were developed to identify gene pair differential co-expression and did not consider cancer progression in algorithm. We then compared these results from three different perspectives: progression analysis, gene pair identification effectiveness analysis, and pathway enrichment analysis. Statistical methods were used to quantify the quality and performance of these different perspectives. They included: Re-identification Scale (RS) and Progression Score (PS) in progression analysis, True Positive Rate (TPR) in gene pair analysis, and Pathway Enrichment Score (PES) in pathway analysis. Our results show small values of RS and large values of PS, TPR, and PES; thus, suggesting that gene pairs identified by the SIG method are highly correlated with cancer progression, and highly enriched in disease-specific pathways. From this research, several gene interaction networks inferred could provide clues for the mechanism of prostate cancer progression. Conclusion The SIG method reliably identifies cancer progression correlated gene pairs, and performs well both in gene pair ontology analysis and in pathway enrichment analysis. This method provides an effective means of understanding the molecular mechanism of carcinogenesis by appropriately tracking down the process of cancer progression.</p

Correlation-maximizing surrogate gene space for visual mining of gene expression patterns in developing barley endosperm tissue

Author: A Buja
Björn Usadel
D D'Alimonte
I Diaz
J Gower
J Herrero
JA Hartigan
KY Yeung
M Pelizzola
M Strickert
M Strickert
Marc Strickert
MB Eisen
N Halford
N Sreenivasulu
Nese Sreenivasulu
R Finkelstein
T Kohonen
T Manoli
T Martinetz
Udo Seiffert
X Zhou
Y Taguchi
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Micro- and macroarray technologies help acquire thousands of gene expression patterns covering important biological processes during plant ontogeny. Particularly, faithful visualization methods are beneficial for revealing interesting gene expression patterns and functional relationships of coexpressed genes. Such screening helps to gain deeper insights into regulatory behavior and cellular responses, as will be discussed for expression data of developing barley endosperm tissue. For that purpose, high-throughput multidimensional scaling (HiT-MDS), a recent method for similarity-preserving data embedding, is substantially refined and used for (a) assessing the quality and reliability of centroid gene expression patterns, and for (b) derivation of functional relationships of coexpressed genes of endosperm tissue during barley grain development (0–26 days after flowering). Results Temporal expression profiles of 4824 genes at 14 time points are faithfully embedded into two-dimensional displays. Thereby, similar shapes of coexpressed genes get closely grouped by a correlation-based similarity measure. As a main result, by using power transformation of correlation terms, a characteristic cloud of points with bipolar sandglass shape is obtained that is inherently connected to expression patterns of pre-storage, intermediate and storage phase of endosperm development. Conclusion The new HiT-MDS-2 method helps to create global views of expression patterns and to validate centroids obtained from clustering programs. Furthermore, functional gene annotation for developing endosperm barley tissue is successfully mapped to the visualization, making easy localization of major centroids of enriched functional categories possible.</p