249 research outputs found
Improved biclustering on expression data through overlapping control
Purpose â The purpose of this paper is to present a novel control mechanism for avoiding overlapping
among biclusters in expression data.
Design/methodology/approach â Biclustering is a technique used in analysis of microarray data.
One of the most popular biclustering algorithms is introduced by Cheng and Church (2000) (Ch&Ch).
Even if this heuristic is successful at finding interesting biclusters, it presents several drawbacks. The
main shortcoming is that it introduces random values in the expression matrix to control the
overlapping. The overlapping control method presented in this paper is based on a matrix of weights,
that is used to estimate the overlapping of a bicluster with already found ones. In this way, the algorithm
is always working on real data and so the biclusters it discovers contain only original data.
Findings â The paper shows that the original algorithm wrongly estimates the quality of the
biclusters after some iterations, due to random values that it introduces. The empirical results show that
the proposed approach is effective in order to improve the heuristic. It is also important to highlight that
many interesting biclusters found by using our approach would have not been obtained using the
original algorithm.
Originality/value â The original algorithm proposed by Ch&Ch is one of the most successful
algorithms for discovering biclusters in microarray data. However, it presents some limitations, the
most relevant being the substitution phase adopted in order to avoid overlapping among biclusters.
The modified version of the algorithm proposed in this paper improves the original one, as proven in the
experimentation.Ministerio de Ciencia y TecnologĂa TIN2007-68084-C02- 0
Pairwise gene GO-based measures for biclustering of high-dimensional expression data
Background: Biclustering algorithms search for groups of genes that share the same
behavior under a subset of samples in gene expression data. Nowadays, the biological
knowledge available in public repositories can be used to drive these algorithms to
find biclusters composed of groups of genes functionally coherent. On the other hand,
a distance among genes can be defined according to their information stored in Gene
Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each
pair of genes which establishes their functional similarity. A scatter search-based
algorithm that optimizes a merit function that integrates GO information is studied in
this paper. This merit function uses a term that addresses the information through a GO
measure.
Results: The effect of two possible different gene pairwise GO measures on the
performance of the algorithm is analyzed. Firstly, three well known yeast datasets with
approximately one thousand of genes are studied. Secondly, a group of human
datasets related to clinical data of cancer is also explored by the algorithm. Most of
these data are high-dimensional datasets composed of a huge number of genes. The
resultant biclusters reveal groups of genes linked by a same functionality when the
search procedure is driven by one of the proposed GO measures. Furthermore, a
qualitative biological study of a group of biclusters show their relevance from a cancer
disease perspective.
Conclusions: It can be concluded that the integration of biological information
improves the performance of the biclustering process. The two different GO measures
studied show an improvement in the results obtained for the yeast dataset. However, if
datasets are composed of a huge number of genes, only one of them really improves
the algorithm performance. This second case constitutes a clear option to explore
interesting datasets from a clinical point of view.Ministerio de EconomĂa y Competitividad TIN2014-55894-C2-
TriGen: A genetic algorithm to mine triclusters in temporal gene expression data
Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering
techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested.
Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping genes to
be evaluated only under a subset of the conditions and not under all of them. However, this technique is not
appropriate for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at
several time points. We present the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that
take into account the experimental conditions and the time points simultaneously. We have used TriGen to mine
datasets related to synthetic data, yeast (Saccharomyces cerevisiae) cell cycle and human inflammation and host
response to injury experiments. TriGen has proved to be capable of extracting groups of genes with similar patterns in
subsets of conditions and times, and these groups have shown to be related in terms of their functional annotations
extracted from the Gene Ontology.Ministerio de Ciencia y TecnologĂa TIN2011-28956-C00Ministerio de Ciencia y TecnologĂa TIN2009-13950Junta de AndalucĂa TIC-752
- âŠ