370 research outputs found

    Binary Particle Swarm Optimization based Biclustering of Web usage Data

    Full text link
    Web mining is the nontrivial process to discover valid, novel, potentially useful knowledge from web data using the data mining techniques or methods. It may give information that is useful for improving the services offered by web portals and information access and retrieval tools. With the rapid development of biclustering, more researchers have applied the biclustering technique to different fields in recent years. When biclustering approach is applied to the web usage data it automatically captures the hidden browsing patterns from it in the form of biclusters. In this work, swarm intelligent technique is combined with biclustering approach to propose an algorithm called Binary Particle Swarm Optimization (BPSO) based Biclustering for Web Usage Data. The main objective of this algorithm is to retrieve the global optimal bicluster from the web usage data. These biclusters contain relationships between web users and web pages which are useful for the E-Commerce applications like web advertising and marketing. Experiments are conducted on real dataset to prove the efficiency of the proposed algorithms

    A New Heuristic for Feature Selection by Consistent Biclustering

    Full text link
    Given a set of data, biclustering aims at finding simultaneous partitions in biclusters of its samples and of the features which are used for representing the samples. Consistent biclusterings allow to obtain correct classifications of the samples from the known classification of the features, and vice versa, and they are very useful for performing supervised classifications. The problem of finding consistent biclusterings can be seen as a feature selection problem, where the features that are not relevant for classification purposes are removed from the set of data, while the total number of features is maximized in order to preserve information. This feature selection problem can be formulated as a linear fractional 0-1 optimization problem. We propose a reformulation of this problem as a bilevel optimization problem, and we present a heuristic algorithm for an efficient solution of the reformulated problem. Computational experiments show that the presented algorithm is able to find better solutions with respect to the ones obtained by employing previously presented heuristic algorithms

    Analysis of regulatory network involved in mechanical induction of embryonic stem cell differentiation

    Get PDF
    Embryonic stem cells are conventionally differentiated by modulating specific growth factors in the cell culture media. Recently the effect of cellular mechanical microenvironment in inducing phenotype specific differentiation has attracted considerable attention. We have shown the possibility of inducing endoderm differentiation by culturing the stem cells on fibrin substrates of specific stiffness [1]. Here, we analyze the regulatory network involved in such mechanically induced endoderm differentiation under two different experimental configurations of 2-dimensional and 3-dimensional culture, respectively. Mouse embryonic stem cells are differentiated on an array of substrates of varying mechanical properties and analyzed for relevant endoderm markers. The experimental data set is further analyzed for identification of co-regulated transcription factors across different substrate conditions using the technique of bi-clustering. Overlapped bi-clusters are identified following an optimization formulation, which is solved using an evolutionary algorithm. While typically such analysis is performed at the mean value of expression data across experimental repeats, the variability of stem cell systems reduces the confidence on such analysis of mean data. Bootstrapping technique is thus integrated with the bi-clustering algorithm to determine sets of robust bi-clusters, which is found to differ significantly from corresponding bi-clusters at the mean data value. Analysis of robust bi-clusters reveals an overall similar network interaction as has been reported for chemically induced endoderm or endodermal organs but with differences in patterning between 2-dimensional and 3-dimensional culture. Such analysis sheds light on the pathway of stem cell differentiation indicating the prospect of the two culture configurations for further maturation. © 2012 Zhang et al

    Pairwise gene GO-based measures for biclustering of high-dimensional expression data

    Get PDF
    Background: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. Results: The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. Conclusions: It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.Ministerio de EconomĂ­a y Competitividad TIN2014-55894-C2-

    Profile Likelihood Biclustering

    Full text link
    Biclustering, the process of simultaneously clustering the rows and columns of a data matrix, is a popular and effective tool for finding structure in a high-dimensional dataset. Many biclustering procedures appear to work well in practice, but most do not have associated consistency guarantees. To address this shortcoming, we propose a new biclustering procedure based on profile likelihood. The procedure applies to a broad range of data modalities, including binary, count, and continuous observations. We prove that the procedure recovers the true row and column classes when the dimensions of the data matrix tend to infinity, even if the functional form of the data distribution is misspecified. The procedure requires computing a combinatorial search, which can be expensive in practice. Rather than performing this search directly, we propose a new heuristic optimization procedure based on the Kernighan-Lin heuristic, which has nice computational properties and performs well in simulations. We demonstrate our procedure with applications to congressional voting records, and microarray analysis.Comment: 40 pages, 11 figures; R package in development at https://github.com/patperry/biclustp
    • 

    corecore