8,108 research outputs found
CUR Decompositions, Similarity Matrices, and Subspace Clustering
A general framework for solving the subspace clustering problem using the CUR
decomposition is presented. The CUR decomposition provides a natural way to
construct similarity matrices for data that come from a union of unknown
subspaces . The similarity
matrices thus constructed give the exact clustering in the noise-free case.
Additionally, this decomposition gives rise to many distinct similarity
matrices from a given set of data, which allow enough flexibility to perform
accurate clustering of noisy data. We also show that two known methods for
subspace clustering can be derived from the CUR decomposition. An algorithm
based on the theoretical construction of similarity matrices is presented, and
experiments on synthetic and real data are presented to test the method.
Additionally, an adaptation of our CUR based similarity matrices is utilized
to provide a heuristic algorithm for subspace clustering; this algorithm yields
the best overall performance to date for clustering the Hopkins155 motion
segmentation dataset.Comment: Approximately 30 pages. Current version contains improved algorithm
and numerical experiments from the previous versio
Unsupervised Instance and Subnetwork Selection for Network Data
Unlike tabular data, features in network data are interconnected within a
domain-specific graph. Examples of this setting include gene expression
overlaid on a protein interaction network (PPI) and user opinions in a social
network. Network data is typically high-dimensional (large number of nodes) and
often contains outlier snapshot instances and noise. In addition, it is often
non-trivial and time-consuming to annotate instances with global labels (e.g.,
disease or normal). How can we jointly select discriminative subnetworks and
representative instances for network data without supervision? We address these
challenges within an unsupervised framework for joint subnetwork and instance
selection in network data, called UISS, via a convex self-representation
objective. Given an unlabeled network dataset, UISS identifies representative
instances while ignoring outliers. It outperforms state-of-the-art baselines on
both discriminative subnetwork selection and representative instance selection,
achieving up to 10% accuracy improvement on all real-world data sets we use for
evaluation. When employed for exploratory analysis in RNA-seq network samples
from multiple studies it produces interpretable and informative summaries
Self-adjustable domain adaptation in personalized ECG monitoring integrated with IR-UWB radar
To enhance electrocardiogram (ECG) monitoring systems in personalized detections, deep neural networks (DNNs) are applied to overcome individual differences by periodical retraining. As introduced previously [4], DNNs relieve individual differences by fusing ECG with impulse radio ultra-wide band (IR-UWB) radar. However, such DNN-based ECG monitoring system tends to overfit into personal small datasets and is difficult to generalize to newly collected unlabeled data. This paper proposes a self-adjustable domain adaptation (SADA) strategy to prevent from overfitting and exploit unlabeled data. Firstly, this paper enlarges the database of ECG and radar data with actual records acquired from 28 testers and expanded by the data augmentation. Secondly, to utilize unlabeled data, SADA combines self organizing maps with the transfer learning in predicting labels. Thirdly, SADA integrates the one-class classification with domain adaptation algorithms to reduce overfitting. Based on our enlarged database and standard databases, a large dataset of 73200 records and a small one of 1849 records are built up to verify our proposal. Results show SADA\u27s effectiveness in predicting labels and increments in the sensitivity of DNNs by 14.4% compared with existing domain adaptation algorithms
Convex and Network Flow Optimization for Structured Sparsity
We consider a class of learning problems regularized by a structured
sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over
groups of variables. Whereas much effort has been put in developing fast
optimization techniques when the groups are disjoint or embedded in a
hierarchy, we address here the case of general overlapping groups. To this end,
we present two different strategies: On the one hand, we show that the proximal
operator associated with a sum of l_infinity-norms can be computed exactly in
polynomial time by solving a quadratic min-cost flow problem, allowing the use
of accelerated proximal gradient methods. On the other hand, we use proximal
splitting techniques, and address an equivalent formulation with
non-overlapping groups, but in higher dimension and with additional
constraints. We propose efficient and scalable algorithms exploiting these two
strategies, which are significantly faster than alternative approaches. We
illustrate these methods with several problems such as CUR matrix
factorization, multi-task learning of tree-structured dictionaries, background
subtraction in video sequences, image denoising with wavelets, and topographic
dictionary learning of natural image patches.Comment: to appear in the Journal of Machine Learning Research (JMLR
- …