67 research outputs found

    A genetic algorithm for interpretable model extraction from decision tree ensembles

    Get PDF
    Models obtained by decision tree induction techniques excel in being interpretable. However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques provide a solution to this problem, and are hence able to achieve higher accuracies. However, this comes at a cost of losing the excellent interpretability of the resulting model, making ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the genesim algorithm that transforms an ensemble of decision trees into a single decision tree with an enhanced predictive performance while maintaining interpretability by using a genetic algorithm. We compared genesim to prevalent decision tree induction algorithms, ensemble techniques and a similar technique, called ism, using twelve publicly available data sets. The results show that genesim achieves better predictive performance on most of these data sets compared to decision tree induction techniques & ism. The results also show that genesim's predictive performance is in the same order of magnitude as the ensemble techniques. However, the resulting model of genesim outperforms the ensemble techniques regarding interpretability as it has a very low complexity

    A history of opioid exposure in females increases the risk of metabolic disorders in their future male offspring

    Get PDF
    © 2019 Society for the Study of Addiction Worldwide consumption of opioids remains at historic levels. Preclinical studies report intergenerational effects on the endogenous opioid system of future progeny following preconception morphine exposure. Given the role of endogenous opioids in energy homeostasis, such effects could impact metabolism in the next generation. Thus, we examined diet-induced modifications in F1 male progeny of morphine-exposed female rats (MORF1). When fed a high fat-sugar diet (FSD) for 6 weeks, MORF1 males display features of emerging metabolic syndrome; they consume more food, gain more weight, and develop fasting-induced hyperglycemia and hyperinsulinemia. In the hypothalamus, proteins involved in energy homeostasis are modified and RNA sequencing revealed down-regulation of genes associated with neuronal plasticity, coupled with up-regulation of genes associated with immune, inflammatory, and metabolic processes that are specific to FSD-maintained MORF1 males. Thus, limited preconception morphine exposure in female rats increases the risk of metabolic syndrome/type 2 diabetes in the next generation

    Principal components analysis based methodology to identify differentially expressed genes in time-course microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Time-course microarray experiments are being increasingly used to characterize dynamic biological processes. In these experiments, the goal is to identify genes differentially expressed in time-course data, measured between different biological conditions. These differentially expressed genes can reveal the changes in biological process due to the change in condition which is essential to understand differences in dynamics.</p> <p>Results</p> <p>In this paper, we propose a novel method for finding differentially expressed genes in time-course data and across biological conditions (say <it>C</it><sub>1 </sub>and <it>C</it><sub>2</sub>). We model the expression at <it>C</it><sub>1 </sub>using Principal Component Analysis and represent the expression profile of each gene as a linear combination of the dominant Principal Components (PCs). Then the expression data from <it>C</it><sub>2 </sub>is projected on the developed PCA model and scores are extracted. The difference between the scores is evaluated using a hypothesis test to quantify the significance of differential expression. We evaluate the proposed method to understand differences in two case studies (1) the heat shock response of wild-type and HSF1 knockout mice, and (2) cell-cycle between wild-type and Fkh1/Fkh2 knockout Yeast strains.</p> <p>Conclusion</p> <p>In both cases, the proposed method identified biologically significant genes.</p

    Carboplatin-induced gene expression changes in vitro are prognostic of survival in epithelial ovarian cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We performed a time-course microarray experiment to define the transcriptional response to carboplatin <it>in vitro</it>, and to correlate this with clinical outcome in epithelial ovarian cancer (EOC). RNA was isolated from carboplatin and control-treated 36M2 ovarian cancer cells at several time points, followed by oligonucleotide microarray hybridization. Carboplatin induced changes in gene expression were assessed at the single gene as well as at the pathway level. Clinical validation was performed in publicly available microarray datasets using disease free and overall survival endpoints.</p> <p>Results</p> <p>Time-course and pathway analyses identified 317 genes and 40 pathways (designated time-course and pathway signatures) deregulated following carboplatin exposure. Both types of signatures were validated in two separate platinum-treated ovarian and NSCLC cell lines using published microarray data. Expression of time-course and pathway signature genes distinguished between patients with unfavorable and favorable survival in two independent ovarian cancer datasets. Among the pathways most highly induced by carboplatin <it>in vitro</it>, the NRF2, NF-kB, and cytokine and inflammatory response pathways were also found to be upregulated prior to chemotherapy exposure in poor prognosis tumors.</p> <p>Conclusion</p> <p>Dynamic assessment of gene expression following carboplatin exposure <it>in vitro </it>can identify both genes and pathways that are correlated with clinical outcome. The functional relevance of this observation for better understanding the mechanisms of drug resistance in EOC will require further evaluation.</p

    Dynamic Changes in Protein Functional Linkage Networks Revealed by Integration with Gene Expression Data

    Get PDF
    Response of cells to changing environmental conditions is governed by the dynamics of intricate biomolecular interactions. It may be reasonable to assume, proteins being the dominant macromolecules that carry out routine cellular functions, that understanding the dynamics of protein∶protein interactions might yield useful insights into the cellular responses. The large-scale protein interaction data sets are, however, unable to capture the changes in the profile of protein∶protein interactions. In order to understand how these interactions change dynamically, we have constructed conditional protein linkages for Escherichia coli by integrating functional linkages and gene expression information. As a case study, we have chosen to analyze UV exposure in wild-type and SOS deficient E. coli at 20 minutes post irradiation. The conditional networks exhibit similar topological properties. Although the global topological properties of the networks are similar, many subtle local changes are observed, which are suggestive of the cellular response to the perturbations. Some such changes correspond to differences in the path lengths among the nodes of carbohydrate metabolism correlating with its loss in efficiency in the UV treated cells. Similarly, expression of hubs under unique conditions reflects the importance of these genes. Various centrality measures applied to the networks indicate increased importance for replication, repair, and other stress proteins for the cells under UV treatment, as anticipated. We thus propose a novel approach for studying an organism at the systems level by integrating genome-wide functional linkages and the gene expression data

    Impairment of circulating endothelial progenitors in Down syndrome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pathological angiogenesis represents a critical issue in the progression of many diseases. Down syndrome is postulated to be a systemic anti-angiogenesis disease model, possibly due to increased expression of anti-angiogenic regulators on chromosome 21. The aim of our study was to elucidate some features of circulating endothelial progenitor cells in the context of this syndrome.</p> <p>Methods</p> <p>Circulating endothelial progenitors of Down syndrome affected individuals were isolated, <it>in vitro </it>cultured and analyzed by confocal and transmission electron microscopy. ELISA was performed to measure SDF-1α plasma levels in Down syndrome and euploid individuals. Moreover, qRT-PCR was used to quantify expression levels of <it>CXCL12 </it>gene and of its receptor in progenitor cells. The functional impairment of Down progenitors was evaluated through their susceptibility to hydroperoxide-induced oxidative stress with BODIPY assay and the major vulnerability to the infection with human pathogens. The differential expression of crucial genes in Down progenitor cells was evaluated by microarray analysis.</p> <p>Results</p> <p>We detected a marked decrease of progenitors' number in young Down individuals compared to euploid, cell size increase and some major detrimental morphological changes. Moreover, Down syndrome patients also exhibited decreased SDF-1α plasma levels and their progenitors had a reduced expression of SDF-1α encoding gene and of its membrane receptor. We further demonstrated that their progenitor cells are more susceptible to hydroperoxide-induced oxidative stress and infection with Bartonella henselae. Further, we observed that most of the differentially expressed genes belong to angiogenesis, immune response and inflammation pathways, and that infected progenitors with trisomy 21 have a more pronounced perturbation of immune response genes than infected euploid cells.</p> <p>Conclusions</p> <p>Our data provide evidences for a reduced number and altered morphology of endothelial progenitor cells in Down syndrome, also showing the higher susceptibility to oxidative stress and to pathogen infection compared to euploid cells, thereby confirming the angiogenesis and immune response deficit observed in Down syndrome individuals.</p

    Making Informed Choices about Microarray Data Analysis

    Get PDF
    This article describes the typical stages in the analysis of microarray data for non-specialist researchers in systems biology and medicine. Particular attention is paid to significant data analysis issues that are commonly encountered among practitioners, some of which need wider airing. The issues addressed include experimental design, quality assessment, normalization, and summarization of multiple-probe data. This article is based on the ISMB 2008 tutorial on microarray data analysis. An expanded version of the material in this article and the slides from the tutorial can be found at http://www.people.vcu.edu/~mreimers/OGMDA/index.html

    Multiclass classification of microarray data samples with a reduced number of genes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained.</p> <p>Results</p> <p>A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples.</p> <p>Conclusions</p> <p>A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.</p
    corecore