126 research outputs found

    Validating module network learning algorithms using simulated data

    Get PDF
    In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio

    Time-Course Analysis of Cyanobacterium Transcriptome: Detecting Oscillatory Genes

    Get PDF
    The microarray technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining these data one can identify the dynamics of the gene expression time series. The detection of genes that are periodically expressed is an important step that allows us to study the regulatory mechanisms associated with the circadian cycle. The problem of finding periodicity in biological time series poses many challenges. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, outliers and unevenly sampled time points. Consequently, the method for finding periodicity should preferably be robust against such anomalies in the data. In this paper, we propose a general and robust procedure for identifying genes with a periodic signature at a given significance level. This identification method is based on autoregressive models and the information theory. By using simulated data we show that the suggested method is capable of identifying rhythmic profiles even in the presence of noise and when the number of data points is small. By recourse of our analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis

    TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors.

    Get PDF
    The genomic regulatory programmes that underlie human organogenesis are poorly understood. Pancreas development, in particular, has pivotal implications for pancreatic regeneration, cancer and diabetes. We have now characterized the regulatory landscape of embryonic multipotent progenitor cells that give rise to all pancreatic epithelial lineages. Using human embryonic pancreas and embryonic-stem-cell-derived progenitors we identify stage-specific transcripts and associated enhancers, many of which are co-occupied by transcription factors that are essential for pancreas development. We further show that TEAD1, a Hippo signalling effector, is an integral component of the transcription factor combinatorial code of pancreatic progenitor enhancers. TEAD and its coactivator YAP activate key pancreatic signalling mediators and transcription factors, and regulate the expansion of pancreatic progenitors. This work therefore uncovers a central role for TEAD and YAP as signal-responsive regulators of multipotent pancreatic progenitors, and provides a resource for the study of embryonic development of the human pancreas

    HAMSTER: visualizing microarray experiments as a set of minimum spanning trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Visualization tools allow researchers to obtain a global view of the interrelationships between the probes or experiments of a gene expression (<it>e.g. microarray</it>) data set. Some existing methods include hierarchical clustering and k-means. In recent years, others have proposed applying minimum spanning trees (MST) for microarray clustering. Although MST-based clustering is formally equivalent to the dendrograms produced by hierarchical clustering under certain conditions; visually they can be quite different.</p> <p>Methods</p> <p>HAMSTER (Helpful Abstraction using Minimum Spanning Trees for Expression Relations) is an open source system for generating a <b>set </b>of MSTs from the experiments of a microarray data set. While previous works have generated a single MST from a data set for data clustering, we recursively merge experiments and repeat this process to obtain a set of MSTs for data visualization. Depending on the parameters chosen, each tree is analogous to a snapshot of one step of the hierarchical clustering process. We scored and ranked these trees using one of three proposed schemes. HAMSTER is implemented in C++ and makes use of Graphviz for laying out each MST.</p> <p>Results</p> <p>We report on the running time of HAMSTER and demonstrate using data sets from the NCBI Gene Expression Omnibus (GEO) that the images created by HAMSTER offer insights that differ from the dendrograms of hierarchical clustering. In addition to the C++ program which is available as open source, we also provided a web-based version (HAMSTER<sup>+</sup>) which allows users to apply our system through a web browser without any computer programming knowledge.</p> <p>Conclusion</p> <p>Researchers may find it helpful to include HAMSTER in their microarray analysis workflow as it can offer insights that differ from hierarchical clustering. We believe that HAMSTER would be useful for certain types of gradient data sets (e.g time-series data) and data that indicate relationships between cells/tissues. Both the source and the web server variant of HAMSTER are available from <url>http://hamster.cbrc.jp/</url>.</p

    A comparison of four clustering methods for brain expression microarray data

    Get PDF
    Background DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed. Results k-means outperforms the other methods, with 100% gene coverage and GO enrichments only slightly exceeded by memISA and ISA. Those two methods produce greater GO enrichments on the datasets used, but at the cost of much lower gene coverage, fewer clusters produced, and speed. The clusters they find are largely different to those produced by k-means. Combining clusters produced by k-means and memISA or ISA leads to increased GO enrichment and number of clusters produced (compared to k-means alone), without negatively impacting gene coverage. memISA can also find potentially disease-related clusters. In two independent dorsolateral prefrontal cortex datasets, it finds three overlapping clusters that are either enriched for genes associated with schizophrenia, genes differentially expressed in schizophrenia, or both. Two of these clusters are enriched for genes of the MAP kinase pathway, suggesting a possible role for this pathway in the aetiology of schizophrenia. Conclusion Considered alone, k-means clustering is the most effective of the four methods on typical microarray brain expression datasets. However, memISA and ISA can add extra high-quality clusters to the set produced by k-means, so combining these three methods is the method of choice

    Pathogenic Bacillus anthracis in the progressive gene losses and gains in adaptive evolution

    Get PDF
    Background: Sequence mutations represent a driving force of adaptive evolution in bacterial pathogens. It is especially evident in reductive genome evolution where bacteria underwent lifestyles shifting from a free-living to a strictly intracellular or host-depending life. It resulted in loss of function mutations and/or the acquisition of virulence gene clusters. Bacillus anthracis shares a common soil bacterial ancestor with its closely related bacillus species but is the only obligate, causative agent of inhalation anthrax within the genus Bacillus. The anthrax-causing Bacillus anthracis experienced the similar lifestyle changes. We thus hypothesized that the bacterial pathogen would follow a compatible evolution path. Results: In this study, a cluster-based evolution scheme was devised to analyze genes that are gained by or lost from B. anthracis. The study detected gene losses/gains at two separate evolutionary stages. The stage I is when B. anthracis and its sister species within the Bacillus cereus group diverged from other species in genus Bacillus. The stage II is when B. anthracis differentiated from its two closest relatives: B. cereus and B. thuringiensis. Many genes gained at these stages are homologues of known pathogenic factors such those for internalin, B. anthracis-specific toxins and large groups of surface proteins and lipoproteins. Conclusion: The analysis presented here allowed us to portray a progressive evolutionary process during the lifestyle shift of B. anthracis, thus providing new insights into how B. anthracis had evolved and bore a promise of finding drug and vaccine targets for this strategically important pathogen

    Cross-Mapping Events in miRNAs Reveal Potential miRNA-Mimics and Evolutionary Implications

    Get PDF
    MicroRNAs (miRNAs) have important roles in various biological processes. miRNA cross-mapping is a prevalent phenomenon where miRNA sequence originating from one genomic region is mapped to another location. To have a better understanding of this phenomenon in the human genome, we performed a detailed analysis in this paper using public miRNA high-throughput sequencing data and all known human miRNAs. We observed widespread cross-mapping events between miRNA precursors (pre-miRNAs), other non-coding RNAs (ncRNAs) and the opposite strands of pre-miRNAs by analyzing the high-throughput sequencing data. Computational analysis on all known human miRNAs also confirmed that many of them could be involved in cross-mapping events. The processing or decay of both ncRNAs and pre-miRNA opposite strand transcripts may contribute to miRNA enrichment, although some might be miRNA-mimics due to miRNA mis-annotation. Comparing to canonical miRNAs, miRNAs involved in cross-mapping events between pre-miRNAs and other ncRNAs normally had shorter lengths (17–19 nt), lower prediction scores and were classified as pseudo miRNA precursors. Notably, 4.9% of all human miRNAs could be accurately mapped to the opposite strands of pre-miRNAs, which showed that both strands of the same genomic region had the potential to produce mature miRNAs and simultaneously implied some potential miRNA precursors. We proposed that the cross-mapping events are more complex than we previously thought. Sequence similarity between other ncRNAs and pre-miRNAs and the specific stem-loop structures of pre-miRNAs may provide evolutionary implications

    Bayesian profiling of molecular signatures to predict event times

    Get PDF
    BACKGROUND: It is of particular interest to identify cancer-specific molecular signatures for early diagnosis, monitoring effects of treatment and predicting patient survival time. Molecular information about patients is usually generated from high throughput technologies such as microarray and mass spectrometry. Statistically, we are challenged by the large number of candidates but only a small number of patients in the study, and the right-censored clinical data further complicate the analysis. RESULTS: We present a two-stage procedure to profile molecular signatures for survival outcomes. Firstly, we group closely-related molecular features into linkage clusters, each portraying either similar or opposite functions and playing similar roles in prognosis; secondly, a Bayesian approach is developed to rank the centroids of these linkage clusters and provide a list of the main molecular features closely related to the outcome of interest. A simulation study showed the superior performance of our approach. When it was applied to data on diffuse large B-cell lymphoma (DLBCL), we were able to identify some new candidate signatures for disease prognosis. CONCLUSION: This multivariate approach provides researchers with a more reliable list of molecular features profiled in terms of their prognostic relationship to the event times, and generates dependable information for subsequent identification of prognostic molecular signatures through either biological procedures or further data analysis

    Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer.</p> <p>Results</p> <p>To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage.</p> <p>Conclusions</p> <p>We provide a computational framework to reconstruct the genetic regulatory network from the microarray data using biological knowledge and constraint-based inferences. Our method is helpful in verifying possible interaction relations in gene regulatory networks and filtering out incorrect relations inferred by imperfect methods. We predicted not only individual gene related to cancer but also discovered significant gene regulation networks. Our method is also validated in several enriched published papers and databases and the significant gene regulatory networks perform critical biological functions and processes including cell adhesion molecules, androgen and estrogen metabolism, smooth muscle contraction, and GO-annotated processes. Those significant gene regulations and the critical concept of tumor progression are useful to understand cancer biology and disease treatment.</p
    corecore