146 research outputs found
SUBIC: A Supervised Bi-Clustering Approach for Precision Medicine
Traditional medicine typically applies one-size-fits-all treatment for the
entire patient population whereas precision medicine develops tailored
treatment schemes for different patient subgroups. The fact that some factors
may be more significant for a specific patient subgroup motivates clinicians
and medical researchers to develop new approaches to subgroup detection and
analysis, which is an effective strategy to personalize treatment. In this
study, we propose a novel patient subgroup detection method, called Supervised
Biclustring (SUBIC) using convex optimization and apply our approach to detect
patient subgroups and prioritize risk factors for hypertension (HTN) in a
vulnerable demographic subgroup (African-American). Our approach not only finds
patient subgroups with guidance of a clinically relevant target variable but
also identifies and prioritizes risk factors by pursuing sparsity of the input
variables and encouraging similarity among the input variables and between the
input and target variable
An effective measure for assessing the quality of biclusters
Biclustering is becoming a popular technique for the study of gene expression data. This is mainly due to the capability of biclustering to address the data using various dimensions simultaneously, as opposed to clustering, which can use only one dimension at the time. Different heuristics have been proposed in order to discover interesting biclusters in data. Such heuristics have one common characteristic: they are guided by a measure that determines the quality of biclusters. It follows that defining such a measure is probably the most important aspect. One of the popular quality measure is the mean squared residue (MSR). However, it has been proven that MSR fails at identifying some kind of patterns. This motivates us to introduce a novel measure, called virtual error (VE), that overcomes this limitation. Results obtained by using VE confirm that it can identify interesting patterns that could not be found by MSR
Biclustering on expression data: A review
Biclustering has become a popular technique for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. Most of biclustering approaches use a measure or cost function that determines the quality of biclusters. In such cases, the development of both a suitable heuristics and a good measure for guiding the search are essential for discovering interesting biclusters in an expression matrix. Nevertheless, not all existing biclustering approaches base their search on evaluation measures for biclusters. There exists a diverse set of biclustering tools that follow different strategies and algorithmic concepts which guide the search towards meaningful results. In this paper we present a extensive survey of biclustering approaches, classifying them into two categories according to whether or not use evaluation metrics within the search method: biclustering algorithms based on evaluation measures and non metric-based biclustering algorithms. In both cases, they have been classified according to the type of meta-heuristics which they are based on.Ministerio de Economía y Competitividad TIN2011-2895
Pairwise gene GO-based measures for biclustering of high-dimensional expression data
Background: Biclustering algorithms search for groups of genes that share the same
behavior under a subset of samples in gene expression data. Nowadays, the biological
knowledge available in public repositories can be used to drive these algorithms to
find biclusters composed of groups of genes functionally coherent. On the other hand,
a distance among genes can be defined according to their information stored in Gene
Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each
pair of genes which establishes their functional similarity. A scatter search-based
algorithm that optimizes a merit function that integrates GO information is studied in
this paper. This merit function uses a term that addresses the information through a GO
measure.
Results: The effect of two possible different gene pairwise GO measures on the
performance of the algorithm is analyzed. Firstly, three well known yeast datasets with
approximately one thousand of genes are studied. Secondly, a group of human
datasets related to clinical data of cancer is also explored by the algorithm. Most of
these data are high-dimensional datasets composed of a huge number of genes. The
resultant biclusters reveal groups of genes linked by a same functionality when the
search procedure is driven by one of the proposed GO measures. Furthermore, a
qualitative biological study of a group of biclusters show their relevance from a cancer
disease perspective.
Conclusions: It can be concluded that the integration of biological information
improves the performance of the biclustering process. The two different GO measures
studied show an improvement in the results obtained for the yeast dataset. However, if
datasets are composed of a huge number of genes, only one of them really improves
the algorithm performance. This second case constitutes a clear option to explore
interesting datasets from a clinical point of view.Ministerio de Economía y Competitividad TIN2014-55894-C2-
Evolutionary Search of Biclusters by Minimal Intrafluctuation
Biclustering techniques aim at extracting significant
subsets of genes and conditions from microarray gene
expression data. This kind of algorithms is mainly based on two
key aspects: the way in which they deal with gene similarity
across the experimental conditions, that determines the quality
of biclusters; and the heuristic or search strategy used for
exploring the search space. A measure that is often adopted
for establishing the quality of biclusters is the mean squared
residue. This measure has been successfully used in many
approaches. However, it has been recently proven that the
mean squared residue fails to recognize some kind of biclusters
as quality biclusters, mainly due to the difficulty of detecting
scaling patterns in data. In this work, we propose a novel
measure for trying to overcome this drawback. This measure
is based on the area between two curves. Such curves are
built from the maximum and minimum standardized expression
values exhibited for each experimental condition. In order
to test the proposed measure, we have incorporated it into
a multiobjective evolutionary algorithm. Experimental results
confirm the effectiveness of our approach. The combination of
the measure we propose with the mean squared residue yields
results that would not have been obtained if only the mean
squared residue had been used.Comisión Interministerial de Ciencia y Tecnología (CICYT) TIN2004-0015
A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data
<p>Abstract</p> <p>Background</p> <p>In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called <it>biclustering</it>. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed.</p> <p>Methods</p> <p>We introduce <it>BiMine</it>, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, <it>BiMine </it>relies on a new evaluation function called <it>Average Spearman's rho </it>(ASR). Second, <it>BiMine </it>uses a new tree structure, called <it>Bicluster Enumeration Tree </it>(BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, <it>BiMine </it>introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters.</p> <p>Results</p> <p>The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that <it>BiMine </it>competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.</p
- …