3,290 research outputs found

    Approximation Algorithms for Bregman Co-clustering and Tensor Clustering

    Full text link
    In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.Comment: 18 pages; improved metric cas

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    A rough set based rational clustering framework for determining correlated genes

    Get PDF
    Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters

    Co-clustering algorithm for the identification of cancer subtypes from gene expression data

    Get PDF
    Cancer has been classified as a heterogeneous genetic disease comprising various different subtypes based on gene expression data. Early stages of diagnosis and prognosis for cancer type have become an essential requirement in cancer informatics research because it is helpful for the clinical treatment of patients. Besides this, gene network interaction which is the significant in order to understand the cellular and progressive mechanisms of cancer has been barely considered in current research. Hence, applications of machine learning methods become an important area for researchers to explore in order to categorize cancer genes into high and low risk groups or subtypes. Presently co-clustering is an extensively used data mining technique for analyzing gene expression data. This paper presents an improved network assisted co-clustering for the identification of cancer subtypes (iNCIS) where it combines gene network information with gene expression data to obtain co-clusters. The effectiveness of iNCIS was evaluated on large-scale Breast Cancer (BRCA) and Glioblastoma Multiforme (GBM). This weighted co-clustering approach in iNCIS delivers a distinctive result to integrate gene network into the clustering procedure

    Constrained Co-clustering of Gene Expression Data

    Get PDF

    Extracting biologically significant patterns from short time series gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult.</p> <p>Results</p> <p>We developed two new algorithms that are capable of extracting biological patterns from short time point series gene expression data. The two algorithms, <it>ASTRO </it>and <it>MiMeSR</it>, are inspired by the <it>rank order preserving </it>framework and the <it>minimum mean squared residue </it>approach, respectively. However, <it>ASTRO </it>and <it>MiMeSR </it>differ from previous approaches in that they take advantage of the relatively few number of time points in order to reduce the problem from NP-hard to linear. Tested on well-defined short time expression data, we found that our approaches are robust to noise, as well as to random patterns, and that they can correctly detect the temporal expression profile of relevant functional categories. Evaluation of our methods was performed using Gene Ontology (GO) annotations and chromatin immunoprecipitation (ChIP-chip) data.</p> <p>Conclusion</p> <p>Our approaches generally outperform both standard clustering algorithms and algorithms designed specifically for clustering of short time series gene expression data. Both algorithms are available at <url>http://www.benoslab.pitt.edu/astro/</url>.</p

    An Archived Multi Objective Simulated Annealing Method to Discover Biclusters in Microarray Data

    Get PDF
    With the advent of microarray technology it has been possible to measure thousands of expression values of genes in a single experiment. Analysis of large scale geonomics data, notably gene expression, has initially focused on clustering methods. Recently, biclustering techniques were proposed for revealing submatrices showing unique patterns. Biclustering or simultaneous clustering of both genes and conditions is challenging particularly for the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining. In biclustering of microarray data, several objectives have to be optimized simultaneously and often these objectives are in conflict with each other. A multi objective model is very suitable for solving this problem. Our method proposes a algorithm which is based on multi objective Simulated Annealing for discovering biclusters in gene expression data. Experimental result in bench mark data base present a significant improvement in overlap among biclusters and coverage of elements in gene expression and quality of biclusters
    • …
    corecore