37,770 research outputs found
Multi-test Decision Tree and its Application to Microarray Data Classification
Objective:
The desirable property of tools used to investigate biological data is
easy to understand models and predictive decisions.
Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity.
Methods:
We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions.
Results:
Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on datasets by an average percent. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model
are supported by biological evidence in the literature.
Conclusion:
This paper introduces a new type of decision tree which is more suitable for solving biological problems.
MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts
Stochastic Modeling of Expression Kinetics Identifies Messenger Half-Lives and Reveals Sequential Waves of Co-ordinated Transcription and Decay
The transcriptome in a cell is finely regulated by a large number of molecular mechanisms able to control the balance between mRNA production and degradation. Recent experimental findings have evidenced that fine and specific regulation of degradation is needed for proper orchestration of a global cell response to environmental conditions. We developed a computational technique based on stochastic modeling, to infer condition-specific individual mRNA half-lives directly from gene expression time-courses. Predictions from our method were validated by experimentally measured mRNA decay rates during the intraerythrocytic developmental cycle of Plasmodium falciparum. We then applied our methodology to publicly available data on the reproductive and metabolic cycle of budding yeast. Strikingly, our analysis revealed, in all cases, the presence of periodic changes in decay rates of sequentially induced genes and co-ordination strategies between transcription and degradation, thus suggesting a general principle for the proper coordination of transcription and degradation machinery in response to internal and/or external stimuli. Citation: Cacace F, Paci P, Cusimano V, Germani A, Farina L (2012) Stochastic Modeling of Expression Kinetics Identifies Messenger Half-Lives and Reveals Sequential Waves of Co-ordinated Transcription and Decay. PLoS Comput Biol 8(11): e1002772. doi:10.1371/journal.pcbi.100277
CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules
Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
Pulsed Feedback Defers Cellular Differentiation
Environmental signals induce diverse cellular differentiation programs. In certain systems, cells defer differentiation for extended time periods after the signal appears, proliferating through multiple rounds of cell division before committing to a new fate. How can cells set a deferral time much longer than the cell cycle? Here we study Bacillus subtilis cells that respond to sudden nutrient limitation with multiple rounds of growth and division before differentiating into spores. A well-characterized genetic circuit controls the concentration and phosphorylation of the master regulator Spo0A, which rises to a critical concentration to initiate sporulation. However, it remains unclear how this circuit enables cells to defer sporulation for multiple cell cycles. Using quantitative time-lapse fluorescence microscopy of Spo0A dynamics in individual cells, we observed pulses of Spo0A phosphorylation at a characteristic cell cycle phase. Pulse amplitudes grew systematically and cell-autonomously over multiple cell cycles leading up to sporulation. This pulse growth required a key positive feedback loop involving the sporulation kinases, without which the deferral of sporulation became ultrasensitive to kinase expression. Thus, deferral is controlled by a pulsed positive feedback loop in which kinase expression is activated by pulses of Spo0A phosphorylation. This pulsed positive feedback architecture provides a more robust mechanism for setting deferral times than constitutive kinase expression. Finally, using mathematical modeling, we show how pulsing and time delays together enable “polyphasic” positive feedback, in which different parts of a feedback loop are active at different times. Polyphasic feedback can enable more accurate tuning of long deferral times. Together, these results suggest that Bacillus subtilis uses a pulsed positive feedback loop to implement a “timer” that operates over timescales much longer than a cell cycle
Recommended from our members
The tumor-promoting functions of Ataxia-telangiectasia mutated (ATM) in cancer cells
textAtaxia-telangiectasia mutated (ATM) protein kinase regulates the DNA damage response (DDR) and is associated with cancer suppression by protecting cells from DNA double-strand breaks (DSBs). However, how ATM functions outside of DSB signaling is less clearly understood. Here, we report a new cancer-promoting role for ATM in stimulating cell migration and invasion independently of DSB signaling or induction. We used two highly metastatic human breast cancer cell lines to corroborate that ATM is required for cell migration and invasion. Microarray analysis of cells depleted for ATM identified interleukin-8 (IL-8) as a target since the exogenous addition of IL-8 rescued migration and invasion defects in ATM-deficient cells. Finally, ATM depletion in human cancer cells reduced lung metastasis in a mouse xenograft model. These findings shed light on tumor-promoting functions of ATM. Therefore, in addition to its canonical roles in tumor suppression, ATM promotes tumor progression as well.Cellular and Molecular Biolog
Quantitative single-cell splicing analysis reveals an ‘economy of scale’ filter for gene expression
In eukaryotic cells, splicing affects the fate of each pre-mRNA transcript, helping to determine whether it is ultimately processed into an mRNA, or degraded. The efficiency of splicing plays a key role in gene expression. However, because it depends on the levels of multiple isoforms at the same transcriptional active site (TAS) in the same cell, splicing efficiency has been challenging to measure. Here, we introduce a quantitative single-molecule FISH-based method that enables determination of the absolute abundances of distinct RNA isoforms at individual TASs. Using this method, we discovered that splicing efficiency behaves in an unexpected ‘economy of scale’ manner, increasing, rather than decreasing, with gene expression levels, opposite to a standard enzymatic process. This behavior could result from an observed correlation between splicing efficiency and spatial proximity to nuclear speckles. Economy of scale splicing represents a non-linear filter that amplifies the expression of genes when they are more strongly transcribed. This method will help to reveal the roles of splicing in the quantitative control of gene expression
- …