6,432 research outputs found
A Convex Formulation for Spectral Shrunk Clustering
Spectral clustering is a fundamental technique in the field of data mining
and information processing. Most existing spectral clustering algorithms
integrate dimensionality reduction into the clustering process assisted by
manifold learning in the original space. However, the manifold in
reduced-dimensional subspace is likely to exhibit altered properties in
contrast with the original space. Thus, applying manifold information obtained
from the original space to the clustering process in a low-dimensional subspace
is prone to inferior performance. Aiming to address this issue, we propose a
novel convex algorithm that mines the manifold structure in the low-dimensional
subspace. In addition, our unified learning process makes the manifold learning
particularly tailored for the clustering. Compared with other related methods,
the proposed algorithm results in more structured clustering result. To
validate the efficacy of the proposed algorithm, we perform extensive
experiments on several benchmark datasets in comparison with some
state-of-the-art clustering approaches. The experimental results demonstrate
that the proposed algorithm has quite promising clustering performance.Comment: AAAI201
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
We show how to approximate a data matrix with a much smaller
sketch that can be used to solve a general class of
constrained k-rank approximation problems to within error.
Importantly, this class of problems includes -means clustering and
unconstrained low rank approximation (i.e. principal component analysis). By
reducing data points to just dimensions, our methods generically
accelerate any exact, approximate, or heuristic algorithm for these ubiquitous
problems.
For -means dimensionality reduction, we provide relative
error results for many common sketching techniques, including random row
projection, column selection, and approximate SVD. For approximate principal
component analysis, we give a simple alternative to known algorithms that has
applications in the streaming setting. Additionally, we extend recent work on
column-based matrix reconstruction, giving column subsets that not only `cover'
a good subspace for \bv{A}, but can be used directly to compute this
subspace.
Finally, for -means clustering, we show how to achieve a
approximation by Johnson-Lindenstrauss projecting data points to just dimensions. This gives the first result that leverages the
specific structure of -means to achieve dimension independent of input size
and sublinear in
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
Exploring the spectroscopic diversity of type Ia supernovae with DRACULA: a machine learning approach
The existence of multiple subclasses of type Ia supernovae (SNeIa) has been
the subject of great debate in the last decade. One major challenge inevitably
met when trying to infer the existence of one or more subclasses is the time
consuming, and subjective, process of subclass definition. In this work, we
show how machine learning tools facilitate identification of subtypes of SNeIa
through the establishment of a hierarchical group structure in the continuous
space of spectral diversity formed by these objects. Using Deep Learning, we
were capable of performing such identification in a 4 dimensional feature space
(+1 for time evolution), while the standard Principal Component Analysis barely
achieves similar results using 15 principal components. This is evidence that
the progenitor system and the explosion mechanism can be described by a small
number of initial physical parameters. As a proof of concept, we show that our
results are in close agreement with a previously suggested classification
scheme and that our proposed method can grasp the main spectral features behind
the definition of such subtypes. This allows the confirmation of the velocity
of lines as a first order effect in the determination of SNIa subtypes,
followed by 91bg-like events. Given the expected data deluge in the forthcoming
years, our proposed approach is essential to allow a quick and statistically
coherent identification of SNeIa subtypes (and outliers). All tools used in
this work were made publicly available in the Python package Dimensionality
Reduction And Clustering for Unsupervised Learning in Astronomy (DRACULA) and
can be found within COINtoolbox (https://github.com/COINtoolbox/DRACULA).Comment: 16 pages, 12 figures, accepted for publication in MNRA
- …