14 research outputs found

    Discovering Coherent Biclusters from Gene Expression Data Using Zero-Suppressed Binary Decision Diagrams

    Get PDF
    The biclustering method can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. This is because the biclustering approach, in contrast to the conventional clustering techniques, focuses on finding a subset of the genes and a subset of the experimental conditions that together exhibit coherent behavior. However, the biclustering problem is inherently intractable, and it is often computationally costly to find biclusters with high levels of coherence. In this work, we propose a novel biclustering algorithm that exploits the zero-suppressed binary decision diagrams (ZBDDs) data structure to cope with the computational challenges. Our method can find all biclusters that satisfy specific input conditions, and it is scalable to practical gene expression data. We also present experimental results confirming the effectiveness of our approach

    Discernment of possible mechanisms of hepatotoxicity via biological processes over-represented by co-expressed genes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hepatotoxicity is a form of liver injury caused by exposure to stressors. Genomic-based approaches have been used to detect changes in transcription in response to hepatotoxicants. However, there are no straightforward ways of using co-expressed genes anchored to a phenotype or constrained by the experimental design for discerning mechanisms of a biological response.</p> <p>Results</p> <p>Through the analysis of a gene expression dataset containing 318 liver samples from rats exposed to hepatotoxicants and leveraging alanine aminotransferase (ALT), a serum enzyme indicative of liver injury as the phenotypic marker, we identified biological processes and molecular pathways that may be associated with mechanisms of hepatotoxicity. Our analysis used an approach called Coherent Co-expression Biclustering (cc-Biclustering) for clustering a subset of genes through a coherent (consistency) measure within each group of samples representing a subset of experimental conditions. Supervised biclustering identified 87 genes co-expressed and correlated with ALT in all the samples exposed to the chemicals. None of the over-represented pathways related to liver injury. However, biclusters with subsets of samples exposed to one of the 7 hepatotoxicants, but not to a non-toxic isomer, contained co-expressed genes that represented pathways related to a stress response. Unsupervised biclustering of the data resulted in 1) four to five times more genes within the bicluster containing all the samples exposed to the chemicals, 2) biclusters with co-expression of genes that discerned 1,4 dichlorobenzene (a non-toxic isomer at low and mid doses) from the other chemicals, pathways and biological processes that underlie liver injury and 3) a bicluster with genes up-regulated in an early response to toxic exposure.</p> <p>Conclusion</p> <p>We obtained clusters of co-expressed genes that over-represented biological processes and molecular pathways related to hepatotoxicity in the rat. The mechanisms involved in the response of the liver to the exposure to 1,4-dichlorobenzene suggest non-genotoxicity whereas the exposure to the hepatotoxicants could be DNA damaging leading to overall genomic instability and activation of cell cycle check point signaling. In addition, key pathways and biological processes representative of an inflammatory response, energy production and apoptosis were impacted by the hepatotoxicant exposures that manifested liver injury in the rat.</p

    Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Biclustering in particular has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Biclustering algorithms also have important applications in sample classification where, for instance, tissue samples can be classified as cancerous or normal. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the "best" grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters.</p> <p>Results</p> <p>In this article, we present a rigorous approach to biclustering, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix so as to globally minimize the dissimilarity metric. The physical permutations of the rows and columns of the data matrix can be modeled as either a network flow problem or a traveling salesman problem. Cluster boundaries in one dimension are used to partition and re-order the other dimensions of the corresponding submatrices to generate biclusters. The performance of OREO is tested on (a) metabolite concentration data, (b) an image reconstruction matrix, (c) synthetic data with implanted biclusters, and gene expression data for (d) colon cancer data, (e) breast cancer data, as well as (f) yeast segregant data to validate the ability of the proposed method and compare it to existing biclustering and clustering methods.</p> <p>Conclusion</p> <p>We demonstrate that this rigorous global optimization method for biclustering produces clusters with more insightful groupings of similar entities, such as genes or metabolites sharing common functions, than other clustering and biclustering algorithms and can reconstruct underlying fundamental patterns in the data for several distinct sets of data matrices arising in important biological applications.</p

    Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis.</p> <p>Results</p> <p>We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm.</p> <p>Conclusion</p> <p>We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.</p

    Graphical Model approaches for Biclustering

    Get PDF
    In many scientific areas, it is crucial to group (cluster) a set of objects, based on a set of observed features. Such operation is widely known as Clustering and it has been exploited in the most different scenarios ranging from Economics to Biology passing through Psychology. Making a step forward, there exist contexts where it is crucial to group objects and simultaneously identify the features that allow to recognize such objects from the others. In gene expression analysis, for instance, the identification of subsets of genes showing a coherent pattern of expression in subsets of objects/samples can provide crucial information about active biological processes. Such information, which cannot be retrieved by classical clustering approaches, can be extracted with the so called Biclustering, a class of approaches which aim at simultaneously clustering both rows and columns of a given data matrix (where each row corresponds to a different object/sample and each column to a different feature). The problem of biclustering, also known as co-clustering, has been recently exploited in a wide range of scenarios such as Bioinformatics, market segmentation, data mining, text analysis and recommender systems. Many approaches have been proposed to address the biclustering problem, each one characterized by different properties such as interpretability, effectiveness or computational complexity. A recent trend involves the exploitation of sophisticated computational models (Graphical Models) to face the intrinsic complexity of biclustering, and to retrieve very accurate solutions. Graphical Models represent the decomposition of a global objective function to analyse in a set of smaller/local functions defined over a subset of variables. The advantages in using Graphical Models relies in the fact that the graphical representation can highlight useful hidden properties of the considered objective function, plus, the analysis of smaller local problems can be dealt with less computational effort. Due to the difficulties in obtaining a representative and solvable model, and since biclustering is a complex and challenging problem, there exist few promising approaches in literature based on Graphical models facing biclustering. 3 This thesis is inserted in the above mentioned scenario and it investigates the exploitation of Graphical Models to face the biclustering problem. We explored different type of Graphical Models, in particular: Factor Graphs and Bayesian Networks. We present three novel algorithms (with extensions) and evaluate such techniques using available benchmark datasets. All the models have been compared with the state-of-the-art competitors and the results show that Factor Graph approaches lead to solid and efficient solutions for dataset of contained dimensions, whereas Bayesian Networks can manage huge datasets, with the overcome that setting the parameters can be not trivial. As another contribution of the thesis, we widen the range of biclustering applications by studying the suitability of these approaches in some Computer Vision problems where biclustering has been never adopted before. Summarizing, with this thesis we provide evidence that Graphical Model techniques can have a significant impact in the biclustering scenario. Moreover, we demonstrate that biclustering techniques are ductile and can produce effective solutions in the most different fields of applications

    Designing Micro- and Nanosystems for a Safer and Healthier Tomorrow

    Get PDF

    Circuits and Systems for High-Throughput Biology

    Get PDF
    The beginning of this millennium has been marked by some remarkable scientific events, notably the completion of the first objective of the Human Genome Project [HGP], i.e., the decoding of the 3 billion bases that compose the human genome. This success has been made possible by the advancement of bio-engineering, data processing and the collaboration of scientists from academic institutions and private companies in many countries. The availability of biological information through web-accessible open databases has stirred further research and enthusiasm. More interestingly, this has changed the way in which molecular biology is approached today, since the newly available large amount of data require the tight interaction between information technology and life science, in a way not appreciated before. Still much has to be accomplished, to realize the potential impact of using the knowledge that we have acquired. Several grand challenges are still open, such as diagnosing and treatment of a number of diseases, understanding details of the complex mechanisms that regulate life, predicting and controlling the evolution of several biological processes. Nevertheless, there is now unprecendent room to reach these objectives, because the underlying technologies that we master have been exploited only to a limited extent. High-throughput biological data acquisition and processing technologies have shifted the focus of biological research from the realm of traditional experimental science (wet biology) to that of information science (in silico biology). Powerful computation and communication means can be applied to the very large amount of apparently incoherent data coming from biomedical research. The technical challenges that lie ahead include the interfacing between the information in biological samples and information and its abstraction in terms of mathematical models and binary data that computer engineers are used to handle. For example, how can we automate costly, repetitive and time consuming processes for the analysis of data that must cover the information contained in a whole organism genome? How can we design a drug that triggers a specific answer? Anyone wearing the hat of a Circuit and System engineer would immediately realize that one important issue is the interfacing of the biological to the electrical world, which is often realized by microscopic probes, able to capture and manipulate bio-materials at the molecular level. A portion of the costly and time consuming experiments and tests that we used to do in vitro and/or in vivo, can now be done in silico. The concept of Laboratory (Lab) on Chip (LoC) is the natural evolution of System on Chip (SoC) by using an array of heterogeneous technologies. Whether LoCs will be realized on a monolithic chip or as a combination of modules is just a technicality. The revolution brought by Labs on Chips is related to the rationalization of bio-analysis, the drastic reduction of sample quantities, and its portability to various environments. We have witnessed the widespread distribution of complex electronic systems due to their low manufacturing costs. Also in this case, LoC costs will be key to their acceptance. But it is easy to foresee that LoCs may be mass produced, with post-silicon manufacturing technologies, where large production volumes correlate to competitive costs. At the same time, the reduction of size, weight and human intervention will limit operating costs and make LoCs competitive. Labs on Chips at medical points of care will fulfill the desire of fast and more accurate diagnosis. Moreover diagnosis at home and/or at mass transit facilities (e.g., airports) can have a significant impact on the overall population health. LoCs for processing environmental data (e.g., pollution) may be coupled with wireless sensor networks to better monitor the planet. The use of the information produced by the Human Genome Project (marking the beginning of the Genomic Era) and its further refinement and understanding (post-Genomic Era), as well as the consequences related to moral and legal implication for the betterment of society has just started. In fact, the decoding of the Human Genome paved the way to a different approach to molecular biology, in that it is now possible to observe the interrelations among whole bodies of molecules such as genes, proteins, transcripts, metabolites in parallel (the so called omic data like genomes, proteomes, transcriptomes, metabolomes etc.), rather than observe and characterize a single chain of a cascade of events (i.e. perform genomic vs genetic analyses). In other words, molecular biology underwent an important shift in the paradigm of research, from a reductionist to a more systemic approach (systems biology) for which models developed in engineering will be of primary importance

    Finding Co-Clusters of Genes and Clinical Parameters

    Get PDF
    For better understanding of genetic mechanisms underlying clinical observations, we often want to determine which genes and clinical traits are interrelated. We introduce a computational method that can find co-clusters or groups of genes and clinical parameters that are believed to be closely related to each other based upon given empirical information. The proposed method was tested with data from an Acute Myelogenous Leukemia (AML) study and identified statistically significant co-clusters of genes and clinical traits. The validation of our results with Gene Ontology (GO) as well as the literature suggest that the proposed method can provide biologically meaningful co-clusters of genes and traits

    Adaptive Compressed Sensing for Support Recovery of Structured Sparse Sets

    Get PDF
    This paper investigates the problem of recovering the support of structured signals via adaptive compressive sensing. We examine several classes of structured support sets, and characterize the fundamental limits of accurately recovering such sets through compressive measurements, while simultaneously providing adaptive support recovery protocols that perform near optimally for these classes. We show that by adaptively designing the sensing matrix we can attain significant performance gains over non-adaptive protocols. These gains arise from the fact that adaptive sensing can: (i) better mitigate the effects of noise, and (ii) better capitalize on the structure of the support sets.Comment: to appear in IEEE Transactions on Information Theor
    corecore