370 research outputs found
Binary Particle Swarm Optimization based Biclustering of Web usage Data
Web mining is the nontrivial process to discover valid, novel, potentially
useful knowledge from web data using the data mining techniques or methods. It
may give information that is useful for improving the services offered by web
portals and information access and retrieval tools. With the rapid development
of biclustering, more researchers have applied the biclustering technique to
different fields in recent years. When biclustering approach is applied to the
web usage data it automatically captures the hidden browsing patterns from it
in the form of biclusters. In this work, swarm intelligent technique is
combined with biclustering approach to propose an algorithm called Binary
Particle Swarm Optimization (BPSO) based Biclustering for Web Usage Data. The
main objective of this algorithm is to retrieve the global optimal bicluster
from the web usage data. These biclusters contain relationships between web
users and web pages which are useful for the E-Commerce applications like web
advertising and marketing. Experiments are conducted on real dataset to prove
the efficiency of the proposed algorithms
A New Heuristic for Feature Selection by Consistent Biclustering
Given a set of data, biclustering aims at finding simultaneous partitions in
biclusters of its samples and of the features which are used for representing
the samples. Consistent biclusterings allow to obtain correct classifications
of the samples from the known classification of the features, and vice versa,
and they are very useful for performing supervised classifications. The problem
of finding consistent biclusterings can be seen as a feature selection problem,
where the features that are not relevant for classification purposes are
removed from the set of data, while the total number of features is maximized
in order to preserve information. This feature selection problem can be
formulated as a linear fractional 0-1 optimization problem. We propose a
reformulation of this problem as a bilevel optimization problem, and we present
a heuristic algorithm for an efficient solution of the reformulated problem.
Computational experiments show that the presented algorithm is able to find
better solutions with respect to the ones obtained by employing previously
presented heuristic algorithms
Analysis of regulatory network involved in mechanical induction of embryonic stem cell differentiation
Embryonic stem cells are conventionally differentiated by modulating specific growth factors in the cell culture media. Recently the effect of cellular mechanical microenvironment in inducing phenotype specific differentiation has attracted considerable attention. We have shown the possibility of inducing endoderm differentiation by culturing the stem cells on fibrin substrates of specific stiffness [1]. Here, we analyze the regulatory network involved in such mechanically induced endoderm differentiation under two different experimental configurations of 2-dimensional and 3-dimensional culture, respectively. Mouse embryonic stem cells are differentiated on an array of substrates of varying mechanical properties and analyzed for relevant endoderm markers. The experimental data set is further analyzed for identification of co-regulated transcription factors across different substrate conditions using the technique of bi-clustering. Overlapped bi-clusters are identified following an optimization formulation, which is solved using an evolutionary algorithm. While typically such analysis is performed at the mean value of expression data across experimental repeats, the variability of stem cell systems reduces the confidence on such analysis of mean data. Bootstrapping technique is thus integrated with the bi-clustering algorithm to determine sets of robust bi-clusters, which is found to differ significantly from corresponding bi-clusters at the mean data value. Analysis of robust bi-clusters reveals an overall similar network interaction as has been reported for chemically induced endoderm or endodermal organs but with differences in patterning between 2-dimensional and 3-dimensional culture. Such analysis sheds light on the pathway of stem cell differentiation indicating the prospect of the two culture configurations for further maturation. © 2012 Zhang et al
Pairwise gene GO-based measures for biclustering of high-dimensional expression data
Background: Biclustering algorithms search for groups of genes that share the same
behavior under a subset of samples in gene expression data. Nowadays, the biological
knowledge available in public repositories can be used to drive these algorithms to
find biclusters composed of groups of genes functionally coherent. On the other hand,
a distance among genes can be defined according to their information stored in Gene
Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each
pair of genes which establishes their functional similarity. A scatter search-based
algorithm that optimizes a merit function that integrates GO information is studied in
this paper. This merit function uses a term that addresses the information through a GO
measure.
Results: The effect of two possible different gene pairwise GO measures on the
performance of the algorithm is analyzed. Firstly, three well known yeast datasets with
approximately one thousand of genes are studied. Secondly, a group of human
datasets related to clinical data of cancer is also explored by the algorithm. Most of
these data are high-dimensional datasets composed of a huge number of genes. The
resultant biclusters reveal groups of genes linked by a same functionality when the
search procedure is driven by one of the proposed GO measures. Furthermore, a
qualitative biological study of a group of biclusters show their relevance from a cancer
disease perspective.
Conclusions: It can be concluded that the integration of biological information
improves the performance of the biclustering process. The two different GO measures
studied show an improvement in the results obtained for the yeast dataset. However, if
datasets are composed of a huge number of genes, only one of them really improves
the algorithm performance. This second case constitutes a clear option to explore
interesting datasets from a clinical point of view.Ministerio de EconomĂa y Competitividad TIN2014-55894-C2-
Profile Likelihood Biclustering
Biclustering, the process of simultaneously clustering the rows and columns
of a data matrix, is a popular and effective tool for finding structure in a
high-dimensional dataset. Many biclustering procedures appear to work well in
practice, but most do not have associated consistency guarantees. To address
this shortcoming, we propose a new biclustering procedure based on profile
likelihood. The procedure applies to a broad range of data modalities,
including binary, count, and continuous observations. We prove that the
procedure recovers the true row and column classes when the dimensions of the
data matrix tend to infinity, even if the functional form of the data
distribution is misspecified. The procedure requires computing a combinatorial
search, which can be expensive in practice. Rather than performing this search
directly, we propose a new heuristic optimization procedure based on the
Kernighan-Lin heuristic, which has nice computational properties and performs
well in simulations. We demonstrate our procedure with applications to
congressional voting records, and microarray analysis.Comment: 40 pages, 11 figures; R package in development at
https://github.com/patperry/biclustp
- âŠ