17,193 research outputs found

    Differential modeling for cancer microarray data

    Get PDF
    Capturing the changes between two biological phenotypes is a crucial task in understanding the mechanisms of various diseases. Most of the existing computational approaches depend on testing the changes in the expression levels of each single gene individually. In this work, we proposed novel computational approaches to identify the differential genes between two phenotypes. These approaches aim to quantitatively characterize the differences between two phenotypes and can provide better insights and understanding of various diseases. The purpose of this thesis is three-fold. Firstly, we review the state-of-the-art approaches for differential analysis of gene expression data. Secondly, we propose a novel differential network analysis approach that is composed of two algorithms, namely, DiffRank and DiffSubNet, to identify differential hubs and differential subnetworks, respectively. In this approach, two datasets are represented as two networks , and then the problem of identifying differential genes is transformed to the problem of comparing two networks to identify the most differential network omponents. Studying such networks can provide valuable knowledge about the data. The DiffRank algorithm ranks the nodes of two networks based on their differential behavior using two novel differential measures: differential connectivity and differential betweenness centrality for each node. These measures are propagated through the network and are optimized to capture the local and global structural changes between two networks. Then, we integrated the results of this algorithm into the proposed differential subnetwork algorithm which is called DiffSubNet. This algorithm aims to identify sets of differentially connected nodes. We demonstrated the effectiveness of these algorithms on synthetic datasets and real-world applications and showed that these algorithms identified meaningful and valuable information compared to some of the baseline methods that can be used for such a task. Thirdly, we propose a novel differential co-clustering approach to efficiently find arbitrarily positioned difeferntial (or discriminative) co-clusters from large datasets. The goal of this approach is to discover a distinguishing set of gene patterns that are highly correlated in a subset of the samples (subspace co-expressions) in one phenotype but not in the other. This approach is useful when the biological samples are assumed to be heterogenous or have multiple subtypes. To achieve this goal, we propose a novel co-clustering algorithm, Ranking-based Arbitrarily Positioned Overlapping Co-Clustering (RAPOCC), to efficiently extract significant co-clusters. This algorithm optimizes a novel ranking-based objective function to find arbitrarily positioned co-clusters, and it can extract large and overlapping co-clusters containing both positively and negatively correlated genes. Then, we extend this algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. The novel discriminative co-clustering algorithm is called Discriminative RAPOCC (Di-RAPOCC), to efficiently extract the discriminative co-clusters from labeled datasets. We also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature. The shift from single gene analysis to the differential gene network analysis and differential co-clustering can play a crucial role in future analysis of gene expression and can help in understanding the mechanism of various diseases

    GeneRank: Using search engine technology for the analysis of microarray experiments

    Get PDF
    Copyright @ 2005 Morrison et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method – based on the PageRank algorithm employed by the popular search engine Google – that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information. Results: GeneRank is an intuitive modification of PageRank that maintains many of its mathematical properties. It combines gene expression information with a network structure derived from gene annotations (gene ontologies) or expression profile correlations. Using both simulated and real data we find that the algorithm offers an improved ranking of genes compared to pure expression change rankings. Conclusion: Our modification of the PageRank algorithm provides an alternative method of evaluating microarray experimental results which combines prior knowledge about the underlying network. GeneRank offers an improvement compared to assessing the importance of a gene based on its experimentally observed fold-change alone and may be used as a basis for further analytical developments

    Differential expression analysis with global network adjustment

    Get PDF
    <p>Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.</p> <p>Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.</p> <p>Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

    Pathway relevance ranking for tumor samples through network-based data integration

    Get PDF
    The study of cancer, a highly heterogeneous disease with different causes and clinical outcomes, requires a multi-angle approach and the collection of large multi-omics datasets that, ideally, should be analyzed simultaneously. We present a new pathway relevance ranking method that is able to prioritize pathways according to the information contained in any combination of tumor related omics datasets. Key to the method is the conversion of all available data into a single comprehensive network representation containing not only genes but also individual patient samples. Additionally, all data are linked through a network of previously identified molecular interactions. We demonstrate the performance of the new method by applying it to breast and ovarian cancer datasets from The Cancer Genome Atlas. By integrating gene expression, copy number, mutation and methylation data, the method's potential to identify key pathways involved in breast cancer development shared by different molecular subtypes is illustrated. Interestingly, certain pathways were ranked equally important for different subtypes, even when the underlying (epi)-genetic disturbances were diverse. Next to prioritizing universally high-scoring pathways, the pathway ranking method was able to identify subtype-specific pathways. Often the score of a pathway could not be motivated by a single mutation, copy number or methylation alteration, but rather by a combination of genetic and epi-genetic disturbances, stressing the need for a network-based data integration approach. The analysis of ovarian tumors, as a function of survival-based subtypes, demonstrated the method's ability to correctly identify key pathways, irrespective of tumor subtype. A differential analysis of survival-based subtypes revealed several pathways with higher importance for the bad-outcome patient group than for the good-outcome patient group. Many of the pathways exhibiting higher importance for the bad-outcome patient group could be related to ovarian tumor proliferation and survival

    Increased entropy of signal transduction in the cancer metastasis phenotype

    Get PDF
    Studies into the statistical properties of biological networks have led to important biological insights, such as the presence of hubs and hierarchical modularity. There is also a growing interest in studying the statistical properties of networks in the context of cancer genomics. However, relatively little is known as to what network features differ between the cancer and normal cell physiologies, or between different cancer cell phenotypes. Based on the observation that frequent genomic alterations underlie a more aggressive cancer phenotype, we asked if such an effect could be detectable as an increase in the randomness of local gene expression patterns. Using a breast cancer gene expression data set and a model network of protein interactions we derive constrained weighted networks defined by a stochastic information flux matrix reflecting expression correlations between interacting proteins. Based on this stochastic matrix we propose and compute an entropy measure that quantifies the degree of randomness in the local pattern of information flux around single genes. By comparing the local entropies in the non-metastatic versus metastatic breast cancer networks, we here show that breast cancers that metastasize are characterised by a small yet significant increase in the degree of randomness of local expression patterns. We validate this result in three additional breast cancer expression data sets and demonstrate that local entropy better characterises the metastatic phenotype than other non-entropy based measures. We show that increases in entropy can be used to identify genes and signalling pathways implicated in breast cancer metastasis. Further exploration of such integrated cancer expression and protein interaction networks will therefore be a fruitful endeavour.Comment: 5 figures, 2 Supplementary Figures and Table

    Rare and common epilepsies converge on a shared gene regulatory network providing opportunities for novel antiepileptic drug discovery

    Get PDF
    Background The relationship between monogenic and polygenic forms of epilepsy is poorly understood, and the extent to which the genetic and acquired epilepsies share common pathways is unclear. Here, we use an integrated systems-level analysis of brain gene expression data to identify molecular networks disrupted in epilepsy. Results We identify a co-expression network of 320 genes (M30), which is significantly enriched for non-synonymous de novo mutations ascertained from patients with monogenic epilepsy, and for common variants associated with polygenic epilepsy. The genes in M30 network are expressed widely in the human brain under tight developmental control, and encode physically interacting proteins involved in synaptic processes. The most highly connected proteins within M30 network are preferentially disrupted by deleterious de novo mutations for monogenic epilepsy, in line with the centrality-lethality hypothesis. Analysis of M30 expression revealed consistent down-regulation in the epileptic brain in heterogeneous forms of epilepsy including human temporal lobe epilepsy, a mouse model of acquired temporal lobe epilepsy, and a mouse model of monogenic Dravet (SCN1A) disease. These results suggest functional disruption of M30 via gene mutation or altered expression as a convergent mechanism regulating susceptibility to epilepsy broadly. Using the large collection of drug-induced gene expression data from Connectivity Map, several drugs were predicted to preferentially restore the down-regulation of M30 in epilepsy toward health, most notably valproic acid, whose effect on M30 expression was replicated in neurons. Conclusions Taken together, our results suggest targeting the expression of M30 as a potential new therapeutic strategy in epilepsy

    A bovine lymphosarcoma cell line infected with theileria annulata exhibits an irreversible reconfiguration of host cell gene expression

    Get PDF
    Theileria annulata, an intracellular parasite of bovine lymphoid cells, induces substantial phenotypic alterations to its host cell including continuous proliferation, cytoskeletal changes and resistance to apoptosis. While parasite induced modulation of host cell signal transduction pathways and NFκB activation are established, there remains considerable speculation on the complexities of the parasite directed control mechanisms that govern these radical changes to the host cell. Our objectives in this study were to provide a comprehensive analysis of the global changes to host cell gene expression with emphasis on those that result from direct intervention by the parasite. By using comparative microarray analysis of an uninfected bovine cell line and its Theileria infected counterpart, in conjunction with use of the specific parasitacidal agent, buparvaquone, we have identified a large number of host cell gene expression changes that result from parasite infection. Our results indicate that the viable parasite can irreversibly modify the transformed phenotype of a bovine cell line. Fifty percent of genes with altered expression failed to show a reversible response to parasite death, a possible contributing factor to initiation of host cell apoptosis. The genes that did show an early predicted response to loss of parasite viability highlighted a sub-group of genes that are likely to be under direct control by parasite infection. Network and pathway analysis demonstrated that this sub-group is significantly enriched for genes involved in regulation of chromatin modification and gene expression. The results provide evidence that the Theileria parasite has the regulatory capacity to generate widespread change to host cell gene expression in a complex and largely irreversible manner

    A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

    Get PDF
    Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer
    corecore