585 research outputs found
SUBIC: A Supervised Bi-Clustering Approach for Precision Medicine
Traditional medicine typically applies one-size-fits-all treatment for the
entire patient population whereas precision medicine develops tailored
treatment schemes for different patient subgroups. The fact that some factors
may be more significant for a specific patient subgroup motivates clinicians
and medical researchers to develop new approaches to subgroup detection and
analysis, which is an effective strategy to personalize treatment. In this
study, we propose a novel patient subgroup detection method, called Supervised
Biclustring (SUBIC) using convex optimization and apply our approach to detect
patient subgroups and prioritize risk factors for hypertension (HTN) in a
vulnerable demographic subgroup (African-American). Our approach not only finds
patient subgroups with guidance of a clinically relevant target variable but
also identifies and prioritizes risk factors by pursuing sparsity of the input
variables and encouraging similarity among the input variables and between the
input and target variable
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
On bicluster aggregation and its benefits for enumerative solutions
Biclustering involves the simultaneous clustering of objects and their
attributes, thus defining local two-way clustering models. Recently, efficient
algorithms were conceived to enumerate all biclusters in real-valued datasets.
In this case, the solution composes a complete set of maximal and non-redundant
biclusters. However, the ability to enumerate biclusters revealed a challenging
scenario: in noisy datasets, each true bicluster may become highly fragmented
and with a high degree of overlapping. It prevents a direct analysis of the
obtained results. To revert the fragmentation, we propose here two approaches
for properly aggregating the whole set of enumerated biclusters: one based on
single linkage and the other directly exploring the rate of overlapping. Both
proposals were compared with each other and with the actual state-of-the-art in
several experiments, and they not only significantly reduced the number of
biclusters but also consistently increased the quality of the solution.Comment: 15 pages, will be published by Springer Verlag in the LNAI Series in
the book Advances in Data Minin
Profile Likelihood Biclustering
Biclustering, the process of simultaneously clustering the rows and columns
of a data matrix, is a popular and effective tool for finding structure in a
high-dimensional dataset. Many biclustering procedures appear to work well in
practice, but most do not have associated consistency guarantees. To address
this shortcoming, we propose a new biclustering procedure based on profile
likelihood. The procedure applies to a broad range of data modalities,
including binary, count, and continuous observations. We prove that the
procedure recovers the true row and column classes when the dimensions of the
data matrix tend to infinity, even if the functional form of the data
distribution is misspecified. The procedure requires computing a combinatorial
search, which can be expensive in practice. Rather than performing this search
directly, we propose a new heuristic optimization procedure based on the
Kernighan-Lin heuristic, which has nice computational properties and performs
well in simulations. We demonstrate our procedure with applications to
congressional voting records, and microarray analysis.Comment: 40 pages, 11 figures; R package in development at
https://github.com/patperry/biclustp
- …