145 research outputs found
Analysis of regulatory network involved in mechanical induction of embryonic stem cell differentiation
Embryonic stem cells are conventionally differentiated by modulating specific growth factors in the cell culture media. Recently the effect of cellular mechanical microenvironment in inducing phenotype specific differentiation has attracted considerable attention. We have shown the possibility of inducing endoderm differentiation by culturing the stem cells on fibrin substrates of specific stiffness [1]. Here, we analyze the regulatory network involved in such mechanically induced endoderm differentiation under two different experimental configurations of 2-dimensional and 3-dimensional culture, respectively. Mouse embryonic stem cells are differentiated on an array of substrates of varying mechanical properties and analyzed for relevant endoderm markers. The experimental data set is further analyzed for identification of co-regulated transcription factors across different substrate conditions using the technique of bi-clustering. Overlapped bi-clusters are identified following an optimization formulation, which is solved using an evolutionary algorithm. While typically such analysis is performed at the mean value of expression data across experimental repeats, the variability of stem cell systems reduces the confidence on such analysis of mean data. Bootstrapping technique is thus integrated with the bi-clustering algorithm to determine sets of robust bi-clusters, which is found to differ significantly from corresponding bi-clusters at the mean data value. Analysis of robust bi-clusters reveals an overall similar network interaction as has been reported for chemically induced endoderm or endodermal organs but with differences in patterning between 2-dimensional and 3-dimensional culture. Such analysis sheds light on the pathway of stem cell differentiation indicating the prospect of the two culture configurations for further maturation. © 2012 Zhang et al
ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
[Abstract]: Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.This work was supported by the Ministry of Economy, Industry and Competitiveness of Spain and FEDER funds of the European Union [grant TIN2016-75845-P (AEI/FEDER/UE)], as well as by Xunta de Galicia (Centro Singular de Investigacion de Galicia accreditation 2016-2019, ref. EDG431G/01).Xunta de Galicia; EDG431G/0
PLoS One
Embryonic stem cells are conventionally differentiated by modulating specific growth factors in the cell culture media. Recently the effect of cellular mechanical microenvironment in inducing phenotype specific differentiation has attracted considerable attention. We have shown the possibility of inducing endoderm differentiation by culturing the stem cells on fibrin substrates of specific stiffness. Here, we analyze the regulatory network involved in such mechanically induced endoderm differentiation under two different experimental configurations of 2-dimensional and 3-dimensional culture, respectively. Mouse embryonic stem cells are differentiated on an array of substrates of varying mechanical properties and analyzed for relevant endoderm markers. The experimental data set is further analyzed for identification of co-regulated transcription factors across different substrate conditions using the technique of bi-clustering. Overlapped bi-clusters are identified following an optimization formulation, which is solved using an evolutionary algorithm. While typically such analysis is performed at the mean value of expression data across experimental repeats, the variability of stem cell systems reduces the confidence on such analysis of mean data. Bootstrapping technique is thus integrated with the bi-clustering algorithm to determine sets of robust bi-clusters, which is found to differ significantly from corresponding bi-clusters at the mean data value. Analysis of robust bi-clusters reveals an overall similar network interaction as has been reported for chemically induced endoderm or endodermal organs but with differences in patterning between 2-dimensional and 3-dimensional culture. Such analysis sheds light on the pathway of stem cell differentiation indicating the prospect of the two culture configurations for further maturation.DP2 116520/DP/NCCDPHP CDC HHS/United StatesUL1 RR024153/RR/NCRR NIH HHS/United StatesUL1 TR000005/TR/NCATS NIH HHS/United States22558203PMC333871
Statistical Algorithms and Bioinformatics Tools Development for Computational Analysis of High-throughput Transcriptomic Data
Next-Generation Sequencing technologies allow for a substantial increase in the amount of data available for various biological studies. In order to effectively and efficiently analyze this data, computational approaches combining mathematics, statistics, computer science, and biology are implemented. Even with the substantial efforts devoted to development of these approaches, numerous issues and pitfalls remain. One of these issues is mapping uncertainty, in which read alignment results are biased due to the inherent difficulties associated with accurately aligning RNA-Sequencing reads. GeneQC is an alignment quality control tool that provides insight into the severity of mapping uncertainty in each annotated gene from alignment results. GeneQC used feature extraction to identify three levels of information for each gene and implements elastic net regularization and mixture model fitting to provide insight in the severity of mapping uncertainty and the quality of read alignment. In combination with GeneQC, the Ambiguous Reads Mapping (ARM) algorithm works to re-align ambiguous reads through the integration of motif prediction from metabolic pathways to establish coregulatory gene modules for re-alignment using a negative binomial distribution-based probabilistic approach. These two tools work in tandem to address the issue of mapping uncertainty and provide more accurate read alignments, and thus more accurate expression estimates. Also presented in this dissertation are two approaches to interpreting the expression estimates. The first is IRIS-EDA, an integrated shiny web server that combines numerous analyses to investigate gene expression data generated from RNASequencing data. The second is ViDGER, an R/Bioconductor package that quickly generates high-quality visualizations of differential gene expression results to assist users in comprehensive interpretations of their differential gene expression results, which is a non-trivial task. These four presented tools cover a variety of aspects of modern RNASeq analyses and aim to address bottlenecks related to algorithmic and computational issues, as well as more efficient and effective implementation methods
Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression
Cite as: Perrin, Dimitri (2008) Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression. PhD thesis, Dublin City University
Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana
BACKGROUND: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions. Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data. Previous expression based methods have not explicitly tried to associate TFs with their targets and therefore largely ignored the treatment specific and time dependent nature of transcription regulation. RESULTS: In this study we introduce CERMT, Covariance based Extraction of Regulatory targets using Multiple Time series. Using simulated and real data we show that using multiple expression time series, selecting treatments in which the TF responds, allowing time shifts between TFs and their targets and using covariance to identify highly responding genes appear to be a good strategy. We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes. CONCLUSION: CERMT could be immediately useful in refining possible target genes of candidate TFs using publicly available data, particularly for organisms lacking comprehensive TF binding data. In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks
Recommended from our members
Collective analysis of multiple high-throughput gene expression datasets
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonModern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts.National Institute for Health Research (NIHR) and the Brunel College of Engineering, Design and Physical Science
Mining localized co-expressed gene patterns from microarray data
Ph.DDOCTOR OF PHILOSOPH
Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.
Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task
- …