25 research outputs found

    A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

    Get PDF
    Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer

    Deterministic Classifiers Accuracy Optimization for Cancer Microarray Data

    Get PDF
    The objective of this study was to improve classification accuracy in cancer microarray gene expression data using a collection of machine learning algorithms available in WEKA. State of the art deterministic classification methods, such as: Kernel Logistic Regression, Support Vector Machine, Stochastic Gradient Descent and Logistic Model Trees were applied on publicly available cancer microarray datasets aiming to discover regularities that provide insights to help characterization and diagnosis correctness on each cancer typology. The implemented models, relying on 10-fold cross-validation, parameterized to enhance accuracy, reached accuracy above 90%. Moreover, although the variety of methodologies, no significant statistic differences were registered between them, at significance level 0.05, confirming that all the selected methods are effective for this type of analysis.info:eu-repo/semantics/publishedVersio

    Mining and state-space modeling and verification of sub-networks from large-scale biomolecular networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biomolecular networks dynamically respond to stimuli and implement cellular function. Understanding these dynamic changes is the key challenge for cell biologists. As biomolecular networks grow in size and complexity, the model of a biomolecular network must become more rigorous to keep track of all the components and their interactions. In general this presents the need for computer simulation to manipulate and understand the biomolecular network model.</p> <p>Results</p> <p>In this paper, we present a novel method to model the regulatory system which executes a cellular function and can be represented as a biomolecular network. Our method consists of two steps. First, a novel scale-free network clustering approach is applied to the large-scale biomolecular network to obtain various sub-networks. Second, a state-space model is generated for the sub-networks and simulated to predict their behavior in the cellular context. The modeling results represent <it>hypotheses </it>that are tested against high-throughput data sets (microarrays and/or genetic screens) for both the natural system and perturbations. Notably, the dynamic modeling component of this method depends on the automated network structure generation of the first component and the sub-network clustering, which are both essential to make the solution tractable.</p> <p>Conclusion</p> <p>Experimental results on time series gene expression data for the human cell cycle indicate our approach is promising for sub-network mining and simulation from large-scale biomolecular network.</p

    A comparison of genetic network models

    No full text
    The inference of genetic interactions from measured expression data is one of the most challenging tasks of modern functional genomics. When successful, the learned network of regulatory interactions yields a wealth of useful information. An inferred genetic network contains information about the pathway to which a gene belongs and which genes it interacts with. Furthermore, it explains the gene&apos;s function in terms of how it influences other genes and indicates which genes are pathway initiators and therefore potential drug targets. Obviously, such wealth comes at a price and that of genetic network modeling is that it is an extremely complex task. Therefore, it is necessary to develop sophisticated computational tools that are able to extract relevant information from a limited set of microarray measurements and integrate this with different information sources, to come up with reliable hypotheses of a genetic regulatory network. Thus far, a multitude of modeling approaches has been proposed for discovering genetic networks. However, it is unclear what the advantages and disadvantages of each of the different approaches are and how their results can be compared. In this review, genetic network models are put in a historical perspective that explains why certain models were introduced. Various modeling assumptions and their consequences are also highlighted. In addition, an overview of the principal differences and similarities between the approaches is given by considering the qualitative properties of the chosen models and their learning strategies. In pharmacogenomics and related areas, a lot of research is directed towards discovering, understanding and/or controlling the outcome of some particular biological pathway. Numerous examples exist where the manipulation of a key enzyme in such a pathway did not lead to the desired effect We know that the structure of complex genetic and biochemical networks lies hidden in the sequence information of our DNA but it is far from trivial to predict gene expression from the sequence code alone. The current availability of microarray measurements of thousands of gene expression levels during the course of an experiment or after the knockout of a gene provides a wealth of complementary information that may be exploited to unravel the complex interplay between genes. It now becomes possible to start answering some of the truly challenging questions in systems biology. For example, is it possible to model these genetic interactions as a large network of interacting elements and can these interactions be effectively learned from measured expression data? Since Kauffman Although the behavior and properties of artificial networks match the observations made in real biological systems well, the field of genetic network modeling has yet to reach its full maturity. The automatic discovery of genetic networks from expression data alone is far from trivial because of the combinatorial nature of the problem and the poor information content of 1 For reasons of brevity, the authors consistently refer only to the first author of each reference

    Androgen receptor profiling predicts prostate cancer outcome

    Get PDF
    Prostate cancer is the second most prevalent malignancy in men. Biomarkers for outcome prediction are urgently needed, so that high-risk patients could be monitored more closely postoperatively. To identify prognostic markers and to determine causal players in prostate cancer progression, we assessed changes in chromatin state during tumor development and progression. Based on this, we assessed genomewide androgen receptor/chromatin binding and identified a distinct androgen receptor/chromatin binding profile between primary prostate cancers and tumors with an acquired resistance to therapy. These differential androgen receptor/chromatin interactions dictated expression of a distinct gene signature with strong prognostic potential. Further refinement of the signature provided us with a concise list of nine genes that hallmark prostate cancer outcome in multiple independent validation series. In this report, we identified a novel gene expression signature for prostate cancer outcome through generation of multilevel genomic data on chromatin accessibility and transcriptional regulation and integration with publically available transcriptomic and clinical datastreams. By combining existing technologies, we propose a novel pipeline for biomarker discovery that is easily implementable in other fields of oncology
    corecore