17 research outputs found

    Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information

    Get PDF
    Motivation: Reconstruction of gene regulatory networks (GRNs), which explicitly represent the causality of developmental or regulatory process, is of utmost interest and has become a challenging computational problem for understanding the complex regulatory mechanisms in cellular systems. However, all existing methods of inferring GRNs from gene expression profiles have their strengths and weaknesses. In particular, many properties of GRNs, such as topology sparseness and non-linear dependence, are generally in regulation mechanism but seldom are taken into account simultaneously in one computational method.Results: In this work, we present a novel method for inferring GRNs from gene expression data considering the non-linear dependence and topological structure of GRNs by employing path consistency algorithm (PCA) based on conditional mutual information (CMI). In this algorithm, the conditional dependence between a pair of genes is represented by the CMI between them. With the general hypothesis of Gaussian distribution underlying gene expression data, CMI between a pair of genes is computed by a concise formula involving the covariance matrices of the related gene expression profiles. The method is validated on the benchmark GRNs from the DREAM challenge and the widely used SOS DNA repair network in Escherichia coli. The cross-validation results confirmed the effectiveness of our method (PCA-CMI), which outperforms significantly other previous methods. Besides its high accuracy, our method is able to distinguish direct (or causal) interactions from indirect associations

    Detecting Differentially Co-Expressed Gene Modules Via The Edge-Count Test

    Get PDF
    Background Gene expression profiling by microarray has been used to uncover molecular variations in many different diseases. Complementary to conventional differential expression analysis, differential co-expression analysis can identify gene markers from the systematic and granular level. There are three aspects for differential co-expression network analysis, including the network global topological comparison, differential co-expression cluster identification, and differential co-expressed genes and gene pair identification. To date, most of the methods available still rely on Pearson’s correlation coefficient despite its nonlinear insensitivity. Results Here we present an approach that is robust to nonlinearity by using the edge-count test for differential co-expression analysis. The performance of the new approach was tested with synthetic data and found to have significant results. For real data, we used a human cervical cancer data set prepared from 29 pairs of cervical tumor and matched normal tissue samples. Hierarchical cluster analysis resulted in the identification of clusters containing differentially co-expressed genes associated with the regulation of cervical cancer. Conclusion The proposed approach targets all different types of differential co-expression and it is sensitive to nonlinear relations. It is easy to implement and can be applied to any sequencing data to identify gene co-expression differences between multiple conditions

    Các độ đo thông tin tương hỗ đa biến có điều kiện

    Get PDF
    Mutual information of two variables is a measure of relationship between two variables: the larger this measure the stronger the dependence, and vice visa. However, mutual information does not indicate whether the relationship between the variables is direct or indirect. To detect "direct mutual relations", we can use conditional mutual information. In the previous studies, we have proposed the mutual information measures of multiple variables. There are many mutual information measures with more than two variables. Each of them is sensitive to a kind of relationships that may exist among the multiple variables. However, as mutual information of two variables, the multivariate mutual information measures do not show whether the multivariate relationships are direct or indirect. In this paper, we propose new multivariate conditional mutual information measures and show that they can detect indirect multivariate relationships through conditional variables.Thông tin tương hỗ (Mutual Information-MI) giữa hai biến đã được sử dụng để phát hiện mối quan hệ giữa hai biến; khi độ đo này lớn thì sự phụ thuộc giữa hai biến cũng lớn và ngược lại. Tuy nhiên, thông tin tương hỗ lại không cho ta biết mối quan hệ giữa các biến là trực tiếp hay gián tiếp. Để phát hiện quan hệ tương hỗ là trực tiếp hay gián tiếp, chúng ta có thể sử dụng thông tin tương hỗ có điều kiện đối với biến thứ ba (Conditional Mutual Information-CMI).   Trong các nghiên cứu trước đây, chúng tôi đã đề xuất các độ đo thông tin tương hỗ đa biến. Có rất nhiều độ đo thông tin tương hỗ khi số biến nhiều hơn hai, mỗi độ đo thể hiện một loại quan hệ có thể tồn tại giữa các biến. Tuy nhiên, cũng như thông tin tương hỗ của hai biến, các độ đo thông tin tương hỗ đa biến chỉ cho ta biết tồn tại hay không một mối quan hệ đa biến; nhưng không cho ta biết mối quan hệ tương hỗ đó là trực tiếp hay gián tiếp. Trong nghiên cứu này, chúng tôi đề xuất các độ đo thông tin tương hỗ đa biến có điều kiện và sử dụng chúng để phát hiện các mối quan hệ đa biến là trực tiếp hay gián tiếp thông qua biến điều kiện

    MICRAT: A Novel Algorithm for Inferring Gene Regulatory Networks Using Time Series Gene Expression Data

    Get PDF
    Background: Reconstruction of gene regulatory networks (GRNs), also known as reverse engineering of GRNs, aims to infer the potential regulation relationships between genes. With the development of biotechnology, such as gene chip microarray and RNA-sequencing, the high-throughput data generated provide us with more opportunities to infer the gene-gene interaction relationships using gene expression data and hence understand the underlying mechanism of biological processes. Gene regulatory networks are known to exhibit a multiplicity of interaction mechanisms which include functional and non-functional, and linear and non-linear relationships. Meanwhile, the regulatory interactions between genes and gene products are not spontaneous since various processes involved in producing fully functional and measurable concentrations of transcriptional factors/proteins lead to a delay in gene regulation. Many different approaches for reconstructing GRNs have been proposed, but the existing GRN inference approaches such as probabilistic Boolean networks and dynamic Bayesian networks have various limitations and relatively low accuracy. Inferring GRNs from time series microarray data or RNA-sequencing data remains a very challenging inverse problem due to its nonlinearity, high dimensionality, sparse and noisy data, and significant computational cost, which motivates us to develop more effective inference methods. Results: We developed a novel algorithm, MICRAT (Maximal Information coefficient with Conditional Relative Average entropy and Time-series mutual information), for inferring GRNs from time series gene expression data. Maximal information coefficient (MIC) is an effective measure of dependence for two-variable relationships. It captures a wide range of associations, both functional and non-functional, and thus has good performance on measuring the dependence between two genes. Our approach mainly includes two procedures. Firstly, it employs maximal information coefficient for constructing an undirected graph to represent the underlying relationships between genes. Secondly, it directs the edges in the undirected graph for inferring regulators and their targets. In this procedure, the conditional relative average entropies of each pair of nodes (or genes) are employed to indicate the directions of edges. Since the time delay might exist in the expression of regulators and target genes, time series mutual information is combined to cooperatively direct the edges for inferring the potential regulators and their targets. We evaluated the performance of MICRAT by applying it to synthetic datasets as well as real gene expression data and compare with other GRN inference methods. We inferred five 10-gene and five 100-gene networks from the DREAM4 challenge that were generated using the gene expression simulator GeneNetWeaver (GNW). MICRAT was also used to reconstruct GRNs on real gene expression data including part of the DNA-damaged response pathway (SOS DNA repair network) and experimental dataset in E. Coli. The results showed that MICRAT significantly improved the inference accuracy, compared to other inference methods, such as TDBN, etc. Conclusion: In this work, a novel algorithm, MICRAT, for inferring GRNs from time series gene expression data was proposed by taking into account dependence and time delay of expressions of a regulator and its target genes. This approach employed maximal information coefficients for reconstructing an undirected graph to represent the underlying relationships between genes. The edges were directed by combining conditional relative average entropy with time course mutual information of pairs of genes. The proposed algorithm was evaluated on the benchmark GRNs provided by the DREAM4 challenge and part of the real SOS DNA repair network in E. Coli. The experimental study showed that our approach was comparable to other methods on 10-gene datasets and outperformed other methods on 100-gene datasets in GRN inference from time series datasets

    Exploring conserved mRNA-miRNA interactions in colon and lung cancers

    Get PDF
    Aim: The main goal of this analysis was prioritization of co-expressed genes and miRNAs that are thought to have important influences in the pathogenesis of colon and lung cancers. Background: MicroRNAs (miRNAs) as small and endogenous noncoding RNAs which regulate gene expression by repressing mRNA translation or decreasing stability of mRNAs; they have proven pivotal roles in different types of cancers. Accumulating evidence indicates the role of miRNAs in a wide range of biological processes from oncogenesis and tumor suppressors to contribution to tumor progression. Colon and lung cancers are frequently encountered challenging types of cancers; therefore, exploring trade-off among underlying biological units such as miRNA with mRNAs will probably lead to identification of promising biomarkers involved in these malignancies. Methods: Colon cancer and lung cancer expression data were downloaded from Firehose and TCGA databases and varied genes extracted by DCGL software were subjected to build two gene regulatory networks by parmigene R package. Afterwards, a network-driven integrative analysis was performed to explore prognosticates genes, miRNAs and underlying pathways. Results: A total of 192 differentially expressed miRNAs and their target genes within gene regulatory networks were derived by ARACNE algorithm. BTF3, TP53, MYC, CALR, NEM2, miR-29b-3p and miR-145 were identified as bottleneck nodes and enriched via biological gene ontology (GO) terms and pathways chiefly in biosynthesis and signaling pathways by further screening. Conclusion: Our study uncovered correlated alterations in gene expression that may relate with colon and lung cancers and highlighted the potent common biomarker candidates for the two diseases

    Conditional Distance Correlation Test for Gene Expression Level, DNA Methylation Level and Copy Number

    Get PDF
    Over the past years, efforts have been devoted to the genome-wide analysis of genetic and epigenetic profiles to better understand the underlying biological mechanisms of complex diseases such as cancer. It is of great importance to unravel the complex dependence structure between biological factors, and many conditional dependence tests have been developed to meet this need. The traditional partial correlation method can only capture the linear partial correlation, but not the nonlinear correlation. To overcome this limitation, we propose to use the innovative conditional distance correlation (CDC), which measures the conditional dependence between random vectors and detect nonlinear relations. In this thesis, the CDC measure is applied to the rich Cancer Genome Atlas (TCGA) ovarian cancer data, and we identify a list of interesting genes with nonlinear features. We integrate three important types of molecular features including gene expression, DNA methylation and copy number variation, and implement the partial correlation test and CDC test to infer the relations between the three measurements for each gene. Out of 196 candidate oncogenes and tumor suppressors, we identify 19 genes in which two of the molecular features are nonlinearly dependent given the third variable. Of these 19 genes, many were reported to be associated with ovarian cancer or breast cancer in the literature. Our findings could shed new light on the biological relations between the three important molecular aspects. This thesis is structured as follows: we begin with a brief introduction to ovarian cancer, TCGA data, the three molecular measurements, and two testing methods in Chapter 1. In the second chapter, we review different statistical methods including Pearson’s partial correlation and conditional distance correlation. In Chapter 3, we conduct an extensive simulation study to compare the empirical performance of different methods. In Chapter 4, we apply the new method to the TCGA ovarian data. We conclude the thesis with future directions in Chapter 5
    corecore