125 research outputs found
An effective measure for assessing the quality of biclusters
Biclustering is becoming a popular technique for the study of gene expression data. This is mainly due to the capability of biclustering to address the data using various dimensions simultaneously, as opposed to clustering, which can use only one dimension at the time. Different heuristics have been proposed in order to discover interesting biclusters in data. Such heuristics have one common characteristic: they are guided by a measure that determines the quality of biclusters. It follows that defining such a measure is probably the most important aspect. One of the popular quality measure is the mean squared residue (MSR). However, it has been proven that MSR fails at identifying some kind of patterns. This motivates us to introduce a novel measure, called virtual error (VE), that overcomes this limitation. Results obtained by using VE confirm that it can identify interesting patterns that could not be found by MSR
Virtual Error: A New Measure for Evolutionary Biclustering
Many heuristics used for finding biclusters in microarray data use the mean squared residue as a way of evaluating the quality of biclusters. This has led to the discovery of interesting biclusters. Recently it has been proven that the mean squared residue may fail to identify some interesting biclusters. This motivates us to introduce a new measure, called Virtual Error, for assessing the quality of biclusters in microarray data. In order to test the validity of the proposed measure, we include it within an evolutionary algorithm. Experimental results show that the use of this novel measure is effective for finding interesting biclusters, which could not have been discovered with the use of the mean squared residue
SUBIC: A Supervised Bi-Clustering Approach for Precision Medicine
Traditional medicine typically applies one-size-fits-all treatment for the
entire patient population whereas precision medicine develops tailored
treatment schemes for different patient subgroups. The fact that some factors
may be more significant for a specific patient subgroup motivates clinicians
and medical researchers to develop new approaches to subgroup detection and
analysis, which is an effective strategy to personalize treatment. In this
study, we propose a novel patient subgroup detection method, called Supervised
Biclustring (SUBIC) using convex optimization and apply our approach to detect
patient subgroups and prioritize risk factors for hypertension (HTN) in a
vulnerable demographic subgroup (African-American). Our approach not only finds
patient subgroups with guidance of a clinically relevant target variable but
also identifies and prioritizes risk factors by pursuing sparsity of the input
variables and encouraging similarity among the input variables and between the
input and target variable
Evolutionary Search of Biclusters by Minimal Intrafluctuation
Biclustering techniques aim at extracting significant
subsets of genes and conditions from microarray gene
expression data. This kind of algorithms is mainly based on two
key aspects: the way in which they deal with gene similarity
across the experimental conditions, that determines the quality
of biclusters; and the heuristic or search strategy used for
exploring the search space. A measure that is often adopted
for establishing the quality of biclusters is the mean squared
residue. This measure has been successfully used in many
approaches. However, it has been recently proven that the
mean squared residue fails to recognize some kind of biclusters
as quality biclusters, mainly due to the difficulty of detecting
scaling patterns in data. In this work, we propose a novel
measure for trying to overcome this drawback. This measure
is based on the area between two curves. Such curves are
built from the maximum and minimum standardized expression
values exhibited for each experimental condition. In order
to test the proposed measure, we have incorporated it into
a multiobjective evolutionary algorithm. Experimental results
confirm the effectiveness of our approach. The combination of
the measure we propose with the mean squared residue yields
results that would not have been obtained if only the mean
squared residue had been used.Comisión Interministerial de Ciencia y TecnologÃa (CICYT) TIN2004-0015
Measuring the Quality of Shifting and Scaling Patterns in Biclusters
The most widespread biclustering algorithms use the Mean Squared Residue (MSR) as measure for assessing the quality of biclusters. MSR can identify correctly shifting patterns, but fails at discovering biclusters presenting scaling patterns. Virtual Error (VE) is a measure which improves the performance of MSR in this sense, since it is effective at recognizing biclusters containing shifting patters or scaling patterns as quality biclusters. However, VE presents some drawbacks when the biclusters present both kind of patterns simultaneously. In this paper, we propose a improvement of VE that can be integrated in any heuristic to discover biclusters with shifting and scaling patterns simultaneously.Ministerio de Ciencia y TecnologÃa TIN2007-68084-C02-0
Finding large average submatrices in high dimensional data
The search for sample-variable associations is an important problem in the
exploratory analysis of high dimensional data. Biclustering methods search for
sample-variable associations in the form of distinguished submatrices of the
data matrix. (The rows and columns of a submatrix need not be contiguous.) In
this paper we propose and evaluate a statistically motivated biclustering
procedure (LAS) that finds large average submatrices within a given real-valued
data matrix. The procedure operates in an iterative-residual fashion, and is
driven by a Bonferroni-based significance score that effectively trades off
between submatrix size and average value. We examine the performance and
potential utility of LAS, and compare it with a number of existing methods,
through an extensive three-part validation study using two gene expression
datasets. The validation study examines quantitative properties of biclusters,
biological and clinical assessments using auxiliary information, and
classification of disease subtypes using bicluster membership. In addition, we
carry out a simulation study to assess the effectiveness and noise sensitivity
of the LAS search procedure. These results suggest that LAS is an effective
exploratory tool for the discovery of biologically relevant structures in high
dimensional data. Software is available at https://genome.unc.edu/las/.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS239 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …