37,770 research outputs found

    Multi-test Decision Tree and its Application to Microarray Data Classification

    Get PDF
    Objective: The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. Methods: We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Results: Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 1414 datasets by an average 66 percent. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. Conclusion: This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts

    Stochastic Modeling of Expression Kinetics Identifies Messenger Half-Lives and Reveals Sequential Waves of Co-ordinated Transcription and Decay

    Get PDF
    The transcriptome in a cell is finely regulated by a large number of molecular mechanisms able to control the balance between mRNA production and degradation. Recent experimental findings have evidenced that fine and specific regulation of degradation is needed for proper orchestration of a global cell response to environmental conditions. We developed a computational technique based on stochastic modeling, to infer condition-specific individual mRNA half-lives directly from gene expression time-courses. Predictions from our method were validated by experimentally measured mRNA decay rates during the intraerythrocytic developmental cycle of Plasmodium falciparum. We then applied our methodology to publicly available data on the reproductive and metabolic cycle of budding yeast. Strikingly, our analysis revealed, in all cases, the presence of periodic changes in decay rates of sequentially induced genes and co-ordination strategies between transcription and degradation, thus suggesting a general principle for the proper coordination of transcription and degradation machinery in response to internal and/or external stimuli. Citation: Cacace F, Paci P, Cusimano V, Germani A, Farina L (2012) Stochastic Modeling of Expression Kinetics Identifies Messenger Half-Lives and Reveals Sequential Waves of Co-ordinated Transcription and Decay. PLoS Comput Biol 8(11): e1002772. doi:10.1371/journal.pcbi.100277

    CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules

    Get PDF
    Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class

    Interpretable Categorization of Heterogeneous Time Series Data

    Get PDF
    Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision trees with a grammar framework. Logical expressions derived from a context-free grammar are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. In particular, when a grammar based on temporal logic is used, we show that GBDTs can be used for the interpretable classi cation of high-dimensional and heterogeneous time series data. Furthermore, we show how GBDTs can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply GBDTs to analyze the classic Australian Sign Language dataset as well as data on near mid-air collisions (NMACs). The NMAC data comes from aircraft simulations used in the development of the next-generation Airborne Collision Avoidance System (ACAS X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data Mining (SDM) 201

    Pulsed Feedback Defers Cellular Differentiation

    Get PDF
    Environmental signals induce diverse cellular differentiation programs. In certain systems, cells defer differentiation for extended time periods after the signal appears, proliferating through multiple rounds of cell division before committing to a new fate. How can cells set a deferral time much longer than the cell cycle? Here we study Bacillus subtilis cells that respond to sudden nutrient limitation with multiple rounds of growth and division before differentiating into spores. A well-characterized genetic circuit controls the concentration and phosphorylation of the master regulator Spo0A, which rises to a critical concentration to initiate sporulation. However, it remains unclear how this circuit enables cells to defer sporulation for multiple cell cycles. Using quantitative time-lapse fluorescence microscopy of Spo0A dynamics in individual cells, we observed pulses of Spo0A phosphorylation at a characteristic cell cycle phase. Pulse amplitudes grew systematically and cell-autonomously over multiple cell cycles leading up to sporulation. This pulse growth required a key positive feedback loop involving the sporulation kinases, without which the deferral of sporulation became ultrasensitive to kinase expression. Thus, deferral is controlled by a pulsed positive feedback loop in which kinase expression is activated by pulses of Spo0A phosphorylation. This pulsed positive feedback architecture provides a more robust mechanism for setting deferral times than constitutive kinase expression. Finally, using mathematical modeling, we show how pulsing and time delays together enable “polyphasic” positive feedback, in which different parts of a feedback loop are active at different times. Polyphasic feedback can enable more accurate tuning of long deferral times. Together, these results suggest that Bacillus subtilis uses a pulsed positive feedback loop to implement a “timer” that operates over timescales much longer than a cell cycle

    Quantitative single-cell splicing analysis reveals an ‘economy of scale’ filter for gene expression

    Get PDF
    In eukaryotic cells, splicing affects the fate of each pre-mRNA transcript, helping to determine whether it is ultimately processed into an mRNA, or degraded. The efficiency of splicing plays a key role in gene expression. However, because it depends on the levels of multiple isoforms at the same transcriptional active site (TAS) in the same cell, splicing efficiency has been challenging to measure. Here, we introduce a quantitative single-molecule FISH-based method that enables determination of the absolute abundances of distinct RNA isoforms at individual TASs. Using this method, we discovered that splicing efficiency behaves in an unexpected ‘economy of scale’ manner, increasing, rather than decreasing, with gene expression levels, opposite to a standard enzymatic process. This behavior could result from an observed correlation between splicing efficiency and spatial proximity to nuclear speckles. Economy of scale splicing represents a non-linear filter that amplifies the expression of genes when they are more strongly transcribed. This method will help to reveal the roles of splicing in the quantitative control of gene expression
    corecore