5,747 research outputs found
Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images
In hyperspectral remote sensing data mining, it is important to take into
account of both spectral and spatial information, such as the spectral
signature, texture feature and morphological property, to improve the
performances, e.g., the image classification accuracy. In a feature
representation point of view, a nature approach to handle this situation is to
concatenate the spectral and spatial features into a single but high
dimensional vector and then apply a certain dimension reduction technique
directly on that concatenated vector before feed it into the subsequent
classifier. However, multiple features from various domains definitely have
different physical meanings and statistical properties, and thus such
concatenation hasn't efficiently explore the complementary properties among
different features, which should benefit for boost the feature
discriminability. Furthermore, it is also difficult to interpret the
transformed results of the concatenated vector. Consequently, finding a
physically meaningful consensus low dimensional feature representation of
original multiple features is still a challenging task. In order to address the
these issues, we propose a novel feature learning framework, i.e., the
simultaneous spectral-spatial feature selection and extraction algorithm, for
hyperspectral images spectral-spatial feature representation and
classification. Specifically, the proposed method learns a latent low
dimensional subspace by projecting the spectral-spatial feature into a common
feature space, where the complementary information has been effectively
exploited, and simultaneously, only the most significant original features have
been transformed. Encouraging experimental results on three public available
hyperspectral remote sensing datasets confirm that our proposed method is
effective and efficient
Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series
The goal of this work is to learn a parsimonious and informative representation for high-dimensional time series. Conceptually, this comprises two distinct yet tightly coupled tasks: learning a low-dimensional manifold and modeling the dynamical process. These two tasks have a complementary relationship as the temporal constraints provide valuable neighborhood information for dimensionality reduction and conversely, the low-dimensional space allows dynamics to be learnt efficiently. Solving these two tasks simultaneously allows important information to be exchanged mutually. If nonlinear models are required to capture the rich complexity of time series, then the learning problem becomes harder as the nonlinearities in both tasks are coupled. The proposed solution approximates the nonlinear manifold and dynamics using piecewise linear models. The interactions among the linear models are captured in a graphical model. By exploiting the model structure, efficient inference and learning algorithms are obtained without oversimplifying the model of the underlying dynamical process. Evaluation of the proposed framework with competing approaches is conducted in three sets of experiments: dimensionality reduction and reconstruction using synthetic time series, video synthesis using a dynamic texture database, and human motion synthesis, classification and tracking on a benchmark data set. In all experiments, the proposed approach provides superior performance.National Science Foundation (IIS 0308213, IIS 0329009, CNS 0202067
Multi-mode partitioning for text clustering to reduce dimensionality and noises
Co-clustering in text mining has been proposed to partition words and documents simultaneously. Although the
main advantage of this approach may improve interpretation of clusters on the data, there are still few proposals
on these methods; while one-way partition is even now widely utilized for information retrieval. In contrast to
structured information, textual data suffer of high dimensionality and sparse matrices, so it is strictly necessary
to pre-process texts for applying clustering techniques. In this paper, we propose a new procedure to reduce high
dimensionality of corpora and to remove the noises from the unstructured data. We test two different processes
to treat data applying two co-clustering algorithms; based on the results we present the procedure that provides
the best interpretation of the data
Protein docking refinement by convex underestimation in the low-dimensional subspace of encounter complexes
We propose a novel stochastic global optimization algorithm with applications to the refinement stage of protein docking prediction methods. Our approach can process conformations sampled from multiple clusters, each roughly corresponding to a different binding energy funnel. These clusters are obtained using a density-based clustering method. In each cluster, we identify a smooth “permissive” subspace which avoids high-energy barriers and then underestimate the binding energy function using general convex polynomials in this subspace. We use the underestimator to bias sampling towards its global minimum. Sampling and subspace underestimation are repeated several times and the conformations sampled at the last iteration form a refined ensemble. We report computational results on a comprehensive benchmark of 224 protein complexes, establishing that our refined ensemble significantly improves the quality of the conformations of the original set given to the algorithm. We also devise a method to enhance the ensemble from which near-native models are selected.Published versio
Randomized Dimension Reduction on Massive Data
Scalability of statistical estimators is of increasing importance in modern
applications and dimension reduction is often used to extract relevant
information from data. A variety of popular dimension reduction approaches can
be framed as symmetric generalized eigendecomposition problems. In this paper
we outline how taking into account the low rank structure assumption implicit
in these dimension reduction approaches provides both computational and
statistical advantages. We adapt recent randomized low-rank approximation
algorithms to provide efficient solutions to three dimension reduction methods:
Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and
Localized Sliced Inverse Regression (LSIR). A key observation in this paper is
that randomization serves a dual role, improving both computational and
statistical performance. This point is highlighted in our experiments on real
and simulated data.Comment: 31 pages, 6 figures, Key Words:dimension reduction, generalized
eigendecompositon, low-rank, supervised, inverse regression, random
projections, randomized algorithms, Krylov subspace method
- …