132 research outputs found

    Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression

    Get PDF
    Cite as: Perrin, Dimitri (2008) Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression. PhD thesis, Dublin City University

    Dynamic biclustering of microarray data by multi-objective immune optimization

    Get PDF
    Abstract Background Newly microarray technologies yield large-scale datasets. The microarray datasets are usually presented in 2D matrices, where rows represent genes and columns represent experimental conditions. Systematic analysis of those datasets provides the increasing amount of information, which is urgently needed in the post-genomic era. Biclustering, which is a technique developed to allow simultaneous clustering of rows and columns of a dataset, might be useful to extract more accurate information from those datasets. Biclustering requires the optimization of two conflicting objectives (residue and volume), and a multi-objective artificial immune system capable of performing a multi-population search. As a heuristic search technique, artificial immune systems (AISs) can be considered a new computational paradigm inspired by the immunological system of vertebrates and designed to solve a wide range of optimization problems. During biclustering several objectives in conflict with each other have to be optimized simultaneously, so multi-objective optimization model is suitable for solving biclustering problem. Results Based on dynamic population, this paper proposes a novel dynamic multi-objective immune optimization biclustering (DMOIOB) algorithm to mine coherent patterns from microarray data. Experimental results on two common and public datasets of gene expression profiles show that our approach can effectively find significant localized structures related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The mined patterns present a significant biological relevance in terms of related biological processes, components and molecular functions in a species-independent manner. Conclusions The proposed DMOIOB algorithm is an efficient tool to analyze large microarray datasets. It achieves a good diversity and rapid convergence

    Biclustering on expression data: A review

    Get PDF
    Biclustering has become a popular technique for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. Most of biclustering approaches use a measure or cost function that determines the quality of biclusters. In such cases, the development of both a suitable heuristics and a good measure for guiding the search are essential for discovering interesting biclusters in an expression matrix. Nevertheless, not all existing biclustering approaches base their search on evaluation measures for biclusters. There exists a diverse set of biclustering tools that follow different strategies and algorithmic concepts which guide the search towards meaningful results. In this paper we present a extensive survey of biclustering approaches, classifying them into two categories according to whether or not use evaluation metrics within the search method: biclustering algorithms based on evaluation measures and non metric-based biclustering algorithms. In both cases, they have been classified according to the type of meta-heuristics which they are based on.Ministerio de Economía y Competitividad TIN2011-2895

    Integrating biclustering techniques with de novo gene regulatory network discovery using RNA-seq from skeletal tissues

    Get PDF
    In order to improve upon stem cell therapy for osteoarthritis, it is necessary to understand the molecular and cellular processes behind bone development and the differences from cartilage formation. To further elucidate these processes would provide a means to analyze the relatedness of bone and cartilage tissue by determining genes that are expressed and regulated for stem cells to differentiate into skeletal tissues. It would also contribute to the classification of differences in normal skeletogenesis and degenerative conditions involving these tissues. The three predominant skeletal tissues of interest are bone, immature cartilage and mature cartilage. Analysis of the transcriptome of these skeletal tissues using RNA-seq technology was performed using differential expression, clustering and biclustering algorithms, to detect similarly expressed genes, which provides evidence for genes potentially interacting together to produce a particular phenotype. Identifying key regulators in the gene regulatory networks (GRNs) driving cartilage and bone development and the differences in the GRNs they drive will facilitate a means to make comparisons between the tissues at the transcriptomic level. Due to a small number of available samples for gene expression data in bone, immature and mature cartilage, it is necessary to determine how the number of samples influences the ability to make accurate GRN predictions. Machine learning techniques for GRN prediction that can incorporate multiple data types have not been well evaluated for complex organisms, nor has RNA-seq data been used often for evaluating these methods. Therefore, techniques identified to work well with microarray data were applied to RNA-seq data from mouse embryonic stem cells, where more samples are available for evaluation compared to the skeletal tissue RNA-seq samples. The RNA-seq data was combined with ChIP-seq data to determine if the machine learning methods outperform simple, correlation-based methods that have been evaluated using RNA-seq data alone. Two of the best performing GRN prediction algorithms from previous large-scale evaluations, which are incapable of incorporating data beyond expression data, were used as a baseline to determine if the addition of multiple data types could help reduce the number of gene expression samples. It was also necessary to identify a biclustering algorithm that could identify potentially biologically relevant modules. Publicly available ChIP-seq and RNA-seq samples from embryonic stem cells were used to measure the performance and consistency of each method, as there was a well-established network in mouse embryonic stem cells to compare results. The methods were then compared to cMonkey2, a biclustering method used in conjunction with ChIP-seq for two important transcription factors in the embryonic stem cell network. This was done to determine if any of these GRN prediction methods could potentially use the small number of skeletal tissue samples available to determine transcription factors orchestrating the expression of other genes driving cartilage and bone formation. Using the embryonic stem cell RNA-seq samples, it was found that sample size, if above 10, does not have a significant impact on the number of true positives in the top predicted interactions. Random forest methods outperform correlation-based methods when using RNA-seq, with area under ROC (AUROC) for evaluation, but the number of true positive interactions predicted when compared to a literature network were similar when using a strict cut-off. Using a limited set of ChIP-seq data was found to not improve the confidence in the transcription factor interactions and had no obvious affect on biclustering results. Correlation-based methods are likely the safest option when based on consistency of the results over multiple runs, but there is still the challenge of determining an appropriate cut-off to the predictions. To predict the skeletal tissue GRNs, cMonkey was used as an initial feature selection method to identify important genes in skeletal tissues and compared with other biclustering methods that do not use ChIP-seq. The predicted skeletal tissue GRNs will be utilized in future analyses of skeletal tissues, focussing on the evolutionary relationship between the GRNs driving skeletal tissue development

    Unsupervised Algorithms for Microarray Sample Stratification

    Get PDF
    The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.Peer reviewe

    EDISA: extracting biclusters from multiple time-series of gene expression profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cells dynamically adapt their gene expression patterns in response to various stimuli. This response is orchestrated into a number of gene expression modules consisting of co-regulated genes. A growing pool of publicly available microarray datasets allows the identification of modules by monitoring expression changes over time. These time-series datasets can be searched for gene expression modules by one of the many clustering methods published to date. For an integrative analysis, several time-series datasets can be joined into a three-dimensional <it>gene-condition-time </it>dataset, to which standard clustering or biclustering methods are, however, not applicable. We thus devise a probabilistic clustering algorithm for <it>gene-condition-time </it>datasets.</p> <p>Results</p> <p>In this work, we present the EDISA (Extended Dimension Iterative Signature Algorithm), a novel probabilistic clustering approach for 3D <it>gene-condition-time </it>datasets. Based on mathematical definitions of gene expression modules, the EDISA samples initial modules from the dataset which are then refined by removing genes and conditions until they comply with the module definition. A subsequent extension step ensures gene and condition maximality. We applied the algorithm to a synthetic dataset and were able to successfully recover the implanted modules over a range of background noise intensities. Analysis of microarray datasets has lead us to define three biologically relevant module types: 1) We found modules with independent response profiles to be the most prevalent ones. These modules comprise genes which are co-regulated under several conditions, yet with a different response pattern under each condition. 2) Coherent modules with similar responses under all conditions occurred frequently, too, and were often contained within these modules. 3) A third module type, which covers a response specific to a single condition was also detected, but rarely. All of these modules are essentially different types of biclusters.</p> <p>Conclusion</p> <p>We successfully applied the EDISA to different 3D datasets. While previous studies were mostly aimed at detecting coherent modules only, our results show that coherent responses are often part of a more general module type with independent response profiles under different conditions. Our approach thus allows for a more comprehensive view of the gene expression response. After subsequent analysis of the resulting modules, the EDISA helped to shed light on the global organization of transcriptional control. An implementation of the algorithm is available at http://www-ra.informatik.uni-tuebingen.de/software/IAGEN/.</p

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

    Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Biclustering in particular has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Biclustering algorithms also have important applications in sample classification where, for instance, tissue samples can be classified as cancerous or normal. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the "best" grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters.</p> <p>Results</p> <p>In this article, we present a rigorous approach to biclustering, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix so as to globally minimize the dissimilarity metric. The physical permutations of the rows and columns of the data matrix can be modeled as either a network flow problem or a traveling salesman problem. Cluster boundaries in one dimension are used to partition and re-order the other dimensions of the corresponding submatrices to generate biclusters. The performance of OREO is tested on (a) metabolite concentration data, (b) an image reconstruction matrix, (c) synthetic data with implanted biclusters, and gene expression data for (d) colon cancer data, (e) breast cancer data, as well as (f) yeast segregant data to validate the ability of the proposed method and compare it to existing biclustering and clustering methods.</p> <p>Conclusion</p> <p>We demonstrate that this rigorous global optimization method for biclustering produces clusters with more insightful groupings of similar entities, such as genes or metabolites sharing common functions, than other clustering and biclustering algorithms and can reconstruct underlying fundamental patterns in the data for several distinct sets of data matrices arising in important biological applications.</p

    Molecular analysis of menadione-induced resistance against biotic stress in Arabidopsis

    Get PDF
    19 páginas, 6 figuras, 2 tablas.Menadione sodium bisulphite (MSB) is a water-soluble derivative of vitamin K3, or menadione, and has been previously demonstrated to function as a plant defence activator against several pathogens in several plant species. However, there are no reports of the role of this vitamin in the induction of resistance in the plant model Arabidopsis thaliana. In the current study, we demonstrate that MSB induces resistance by priming in Arabidopsis against the virulent strain Pseudomonas syringae pv. tomato DC3000 (Pto) without inducing necrosis or visible damage. Changes in gene expression in response to 0.2 mm MSB were analysed in Arabidopsis at 3, 6 and 24 h post-treatment using microarray technology. In general, the treatment with MSB does not correlate with other publicly available data, thus MSB produces a unique molecular footprint. We observed 158 differentially regulated genes among all the possible trends. More up-regulated genes are included in categories such as 'response to stress' than the background, and the behaviour of these genes in different treatments confirms their role in response to biotic and abiotic stress. In addition, there is an over-representation of the G-box in their promoters. Some interesting functions are represented among the individual up-regulated genes, such as glutathione S-transferases, transcription factors (including putative regulators of the G-box) and cytochrome P450s. This work provides a wide insight into the molecular cues underlying the effect of MSB as a plant resistance inducer.This work was partially funded by an INVESCAN, S.L. grant (No.OTT2001438) to the CSIC and by a BIO2006-02168 grant of MICINN to PT. The microarrays were funded in part by the “Genome España” Foundation. MER was supported by a research contract (ID-TF-06/002) from the Consejería de Industria, Comercio y Nuevas Tecnologías (Gobierno de Canarias). The authors thank CajaCanarias for their research support. We also thank Lorena Perales for her help in performing the bacterial growth curves, Dr. Héctor Cabrera for his useful advice on writing the manuscript, the English translation service of the Universidad Politécnica de Valencia and Mrs. Pauline Agnew whose endeavoured to edit the English translation of this paper.Peer reviewe
    corecore