7,765 research outputs found
Analysis of regulatory network involved in mechanical induction of embryonic stem cell differentiation
Embryonic stem cells are conventionally differentiated by modulating specific growth factors in the cell culture media. Recently the effect of cellular mechanical microenvironment in inducing phenotype specific differentiation has attracted considerable attention. We have shown the possibility of inducing endoderm differentiation by culturing the stem cells on fibrin substrates of specific stiffness [1]. Here, we analyze the regulatory network involved in such mechanically induced endoderm differentiation under two different experimental configurations of 2-dimensional and 3-dimensional culture, respectively. Mouse embryonic stem cells are differentiated on an array of substrates of varying mechanical properties and analyzed for relevant endoderm markers. The experimental data set is further analyzed for identification of co-regulated transcription factors across different substrate conditions using the technique of bi-clustering. Overlapped bi-clusters are identified following an optimization formulation, which is solved using an evolutionary algorithm. While typically such analysis is performed at the mean value of expression data across experimental repeats, the variability of stem cell systems reduces the confidence on such analysis of mean data. Bootstrapping technique is thus integrated with the bi-clustering algorithm to determine sets of robust bi-clusters, which is found to differ significantly from corresponding bi-clusters at the mean data value. Analysis of robust bi-clusters reveals an overall similar network interaction as has been reported for chemically induced endoderm or endodermal organs but with differences in patterning between 2-dimensional and 3-dimensional culture. Such analysis sheds light on the pathway of stem cell differentiation indicating the prospect of the two culture configurations for further maturation. Š 2012 Zhang et al
A rough set based rational clustering framework for determining correlated genes
Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters
PLoS One
Embryonic stem cells are conventionally differentiated by modulating specific growth factors in the cell culture media. Recently the effect of cellular mechanical microenvironment in inducing phenotype specific differentiation has attracted considerable attention. We have shown the possibility of inducing endoderm differentiation by culturing the stem cells on fibrin substrates of specific stiffness. Here, we analyze the regulatory network involved in such mechanically induced endoderm differentiation under two different experimental configurations of 2-dimensional and 3-dimensional culture, respectively. Mouse embryonic stem cells are differentiated on an array of substrates of varying mechanical properties and analyzed for relevant endoderm markers. The experimental data set is further analyzed for identification of co-regulated transcription factors across different substrate conditions using the technique of bi-clustering. Overlapped bi-clusters are identified following an optimization formulation, which is solved using an evolutionary algorithm. While typically such analysis is performed at the mean value of expression data across experimental repeats, the variability of stem cell systems reduces the confidence on such analysis of mean data. Bootstrapping technique is thus integrated with the bi-clustering algorithm to determine sets of robust bi-clusters, which is found to differ significantly from corresponding bi-clusters at the mean data value. Analysis of robust bi-clusters reveals an overall similar network interaction as has been reported for chemically induced endoderm or endodermal organs but with differences in patterning between 2-dimensional and 3-dimensional culture. Such analysis sheds light on the pathway of stem cell differentiation indicating the prospect of the two culture configurations for further maturation.DP2 116520/DP/NCCDPHP CDC HHS/United StatesUL1 RR024153/RR/NCRR NIH HHS/United StatesUL1 TR000005/TR/NCATS NIH HHS/United States22558203PMC333871
Recommended from our members
Collective analysis of multiple high-throughput gene expression datasets
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonModern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts.National Institute for Health Research (NIHR) and the Brunel College of Engineering, Design and Physical Science
Model-free reconstruction of neuronal network connectivity from calcium imaging signals
A systematic assessment of global neural network connectivity through direct
electrophysiological assays has remained technically unfeasible even in
dissociated neuronal cultures. We introduce an improved algorithmic approach
based on Transfer Entropy to reconstruct approximations to network structural
connectivities from network activity monitored through calcium fluorescence
imaging. Based on information theory, our method requires no prior assumptions
on the statistics of neuronal firing and neuronal connections. The performance
of our algorithm is benchmarked on surrogate time-series of calcium
fluorescence generated by the simulated dynamics of a network with known
ground-truth topology. We find that the effective network topology revealed by
Transfer Entropy depends qualitatively on the time-dependent dynamic state of
the network (e.g., bursting or non-bursting). We thus demonstrate how
conditioning with respect to the global mean activity improves the performance
of our method. [...] Compared to other reconstruction strategies such as
cross-correlation or Granger Causality methods, our method based on improved
Transfer Entropy is remarkably more accurate. In particular, it provides a good
reconstruction of the network clustering coefficient, allowing to discriminate
between weakly or strongly clustered topologies, whereas on the other hand an
approach based on cross-correlations would invariantly detect artificially high
levels of clustering. Finally, we present the applicability of our method to
real recordings of in vitro cortical cultures. We demonstrate that these
networks are characterized by an elevated level of clustering compared to a
random graph (although not extreme) and by a markedly non-local connectivity.Comment: 54 pages, 8 figures (+9 supplementary figures), 1 table; submitted
for publicatio
Mitmemþþtmeliste andmete statistiline analßßs bioinformaatikas
Väitekirja elektrooniline versioon ei sisalda publikatsioone.Valgud on organismide ßhed tähtsaimad ehituskivid. Nende kogust ja omavahelisi seoseid uurides on vþimalik saada infot organismi seisundi kohta. Tänapäevased seadmed vþimaldavad koguda lßhikese ajaga palju valkudega seotud andmeid. Nende analßßs on aga suhteliselt keerukas ja on loonud uue teadusharu nimega bioinformaatika.
Käesoleva doktoritÜÜ eesmärgiks on kirjeldada mitmemþþtmeliste andmete statistilise analßßsiga seotud probleeme ja nende lahendusi. Näidatakse, kuidas sellised andmed saab esitada maatriksi kujul. Antakse ßlevaade andmeallikatest ja analßßsimeetoditest ning näidatakse, kuidas neid saab praktikas kasutada. Kirjeldatakse ßleeuroopalist vähiuuringute projekti PREDECT, kus paljud organisatsioonid osalevad vähimudelite täiustamises. Antakse ßlevaade metaandmete kogumisest paljudelt partneritelt, samuti veebitÜÜriistadest, mis loodi esmaseks andmeanalßßsiks. Kirjeldatakse uudse rinnavähi mudeliga seotud analßßsi ja koelþikude vþrdlust erinevates laboritingimustes. Tutvustatakse vabalt kasutatavat veebitÜÜriista, millega saab teha kirjeldavat andmeanalßßsi. Järgmistes peatßkkides kirjeldatakse andmeanalßßsi erinevates uuringutes. Inimese platsentas leiti mitmeid uusi alleelispetsiifilise ekspressiooniga geene. Uuriti atoopilise dermatiidi molekulaarseid mehhanisme, täpsemalt valgu gamma-interferoon mþju sellele haigusele. Leiti mikroRNAsid, mida saab kasutada endometrioosi markeritena, ja loodi klassifitseerija endometrioosihaigete eristamiseks tervetest.Proteins are one of the most important building blocks of an organism. By investigating the abundance and relations between different proteins, it is possible to get information about the current state of the organism. Modern technologies allow to collect a large amount of data related to proteins in a short period of time. This type of analysis is quite complicated and has created a new field of science called bioinformatics.
The aim of the dissertation is to describe problems and solutions related to statistical analysis of multivariate data. It is shown how this type of data can be presented as a matrix. An overview of data sources and analysis methods is given and it is shown how they can be used in practice. A pan-European project PREDECT is described where many organizations are contributing to develop better cancer models. An overview is given about collecting metadata from multiple partners, and about web tools created for initial data analysis. An analysis concerning a novel breast cancer model is described, and a comparison of tissue slices in different cultivation conditions is made. A freely available web tool is introduced which allows to perform exploratory data analysis. Next chapters describe data analysis in various projects. Multiple novel genes were found in the human placenta that have an allele-specific expression. Molecular mechanisms of a disease called atopic dermatitis were examined, more specifically the influence of the protein interferon-gamma. MicroRNAs were found that can be used as markers for a disease called endometriosis, and a classifier was built to differentiate people with endometriosis from healthy people
AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
<p>Abstract</p> <p>Background</p> <p>Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry.</p> <p>Results</p> <p>We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four.</p> <p>Conclusions</p> <p>By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at <url>http://jimcooperlab.mcdb.ucsb.edu/autosome</url>.</p
- âŚ