27,097 research outputs found
Unsupervised Feature Selection with Adaptive Structure Learning
The problem of feature selection has raised considerable interests in the
past decade. Traditional unsupervised methods select the features which can
faithfully preserve the intrinsic structures of data, where the intrinsic
structures are estimated using all the input features of data. However, the
estimated intrinsic structures are unreliable/inaccurate when the redundant and
noisy features are not removed. Therefore, we face a dilemma here: one need the
true structures of data to identify the informative features, and one need the
informative features to accurately estimate the true structures of data. To
address this, we propose a unified learning framework which performs structure
learning and feature selection simultaneously. The structures are adaptively
learned from the results of feature selection, and the informative features are
reselected to preserve the refined structures of data. By leveraging the
interactions between these two essential tasks, we are able to capture accurate
structures and select more informative features. Experimental results on many
benchmark data sets demonstrate that the proposed method outperforms many state
of the art unsupervised feature selection methods
Active Semi-Supervised Learning Using Sampling Theory for Graph Signals
We consider the problem of offline, pool-based active semi-supervised
learning on graphs. This problem is important when the labeled data is scarce
and expensive whereas unlabeled data is easily available. The data points are
represented by the vertices of an undirected graph with the similarity between
them captured by the edge weights. Given a target number of nodes to label, the
goal is to choose those nodes that are most informative and then predict the
unknown labels. We propose a novel framework for this problem based on our
recent results on sampling theory for graph signals. A graph signal is a
real-valued function defined on each node of the graph. A notion of frequency
for such signals can be defined using the spectrum of the graph Laplacian
matrix. The sampling theory for graph signals aims to extend the traditional
Nyquist-Shannon sampling theory by allowing us to identify the class of graph
signals that can be reconstructed from their values on a subset of vertices.
This approach allows us to define a criterion for active learning based on
sampling set selection which aims at maximizing the frequency of the signals
that can be reconstructed from their samples on the set. Experiments show the
effectiveness of our method.Comment: 10 pages, 6 figures, To appear in KDD'1
- …