Search CORE

5,830 research outputs found

A framework for generalized subspace pattern mining in high-dimensional datasets

Author: A Ben-Dor
A Berkenblit
A Prelic
A Tanay
A Tanay
B Pontes
C Curtis
C Huttenhower
CM Perou
Edward WJ Curry
G Chen
G Li
GD Aletti
J Hartigan
K Eren
KD Hansen
L Lazzeroni
M Gerlinger
MB Eisen
RW Tothill
S Bergmann
S Hochreiter
SC Madeira
T Murali
Y Cheng
Y Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A framework for generalized subspace pattern mining in high-dimensional datasets

Author: Curry EWJ
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2014
Field of study

Background A generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix. This approach is particularly well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients. Different definitions of biclusters will offer different opportunities to discover information from datasets, making it pertinent to tailor the desired patterns to the intended application. This paper introduces ‘GABi’, a customizable framework for subspace pattern mining suited to large heterogeneous datasets. Most existing biclustering algorithms discover biclusters of only a few distinct structures. However, by enabling definition of arbitrary bicluster models, the GABi framework enables the application of biclustering to tasks for which no existing algorithm could be used. Results First, a series of artificial datasets were constructed to represent three clearly distinct scenarios for applying biclustering. With a bicluster model created for each distinct scenario, GABi is shown to recover the correct solutions more effectively than a panel of alternative approaches, where the bicluster model may not reflect the structure of the desired solution. Secondly, the GABi framework is used to integrate clinical outcome data with an ovarian cancer DNA methylation dataset, leading to the discovery that widespread dysregulation of DNA methylation associates with poor patient prognosis, a result that has not previously been reported. This illustrates a further benefit of the flexible bicluster definition of GABi, which is that it enables incorporation of multiple sources of data, with each data source treated in a specific manner, leading to a means of intelligent integrated subspace pattern mining across multiple datasets. Conclusions The GABi framework enables discovery of biologically relevant patterns of any specified structure from large collections of genomic data. An R implementation of the GABi framework is available through CRAN (http://cran.r-project.org/web/packages/GABi/index.html)

Springer - Publisher Connector

PubMed Central

Spiral - Imperial College Digital Repository

A Survey on Soft Subspace Clustering

Author: Choi Kup-Sze
Deng Zhaohong
Jiang Yizhang
Wang Jun
Wang Shitong
Publication venue: 'Elsevier BV'
Publication date: 07/04/2016
Field of study

Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

arXiv.org e-Print Archive

The Hong Kong Polytechnic University Pao Yue-kong Library

PolyU Institutional Repository

Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction

Author: A D’aspremont
B Efron
C Fyfe
CHQ Ding
CM Bishop
D Cai
D Tao
D Tao
D Tao
Dacheng Tao
DB Graham
DL Donoho
E Candes
EJ Candes
GH Golub
GM James
H Hotelling
H Zou
H Zou
H Zou
H Zou
HP Kriegel
J Fan
J Fan
J Lv
JB Tenenbaum
M Belkin
M Belkin
PJ Phillips
PN Belhumeur
R Tibshirani
RA Fisher
S Yan
ST Roweis
T Li
T Zhang
Tianyi Zhou
X He
X Li
Xindong Wu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/07/2010
Field of study

It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al. \cite{LARS}), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: 1) the local geometry of samples is well preserved for low dimensional data representation, 2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, 3) the projection matrix of MEN improves the parsimony in computation, 4) the elastic net penalty reduces the over-fitting problem, and 5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms.Comment: 33 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

OPUS - University of Technology Sydney

Low-Rank Matrices on Graphs: Generalized Recovery & Applications

Author: Perraudin Nathanael
Shahid Nauman
Vandergheynst Pierre
Publication venue
Publication date: 18/05/2016
Field of study

Many real world datasets subsume a linear or non-linear low-rank structure in a very low-dimensional space. Unfortunately, one often has very little or no information about the geometry of the space, resulting in a highly under-determined recovery problem. Under certain circumstances, state-of-the-art algorithms provide an exact recovery for linear low-rank structures but at the expense of highly inscalable algorithms which use nuclear norm. However, the case of non-linear structures remains unresolved. We revisit the problem of low-rank recovery from a totally different perspective, involving graphs which encode pairwise similarity between the data samples and features. Surprisingly, our analysis confirms that it is possible to recover many approximate linear and non-linear low-rank structures with recovery guarantees with a set of highly scalable and efficient algorithms. We call such data matrices as \textit{Low-Rank matrices on graphs} and show that many real world datasets satisfy this assumption approximately due to underlying stationarity. Our detailed theoretical and experimental analysis unveils the power of the simple, yet very novel recovery framework \textit{Fast Robust PCA on Graphs

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne