774 research outputs found
A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification
Nearest Neighbors (NN) is one of the most widely used supervised
learning algorithms to classify Gaussian distributed data, but it does not
achieve good results when it is applied to nonlinear manifold distributed data,
especially when a very limited amount of labeled samples are available. In this
paper, we propose a new graph-based NN algorithm which can effectively
handle both Gaussian distributed data and nonlinear manifold distributed data.
To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by
constructing an -level nearest-neighbor strengthened tree over the graph,
and then compute a TRW matrix for similarity measurement purposes. After this,
the nearest neighbors are identified according to the TRW matrix and the class
label of a query point is determined by the sum of all the TRW weights of its
nearest neighbors. To deal with online situations, we also propose a new
algorithm to handle sequential samples based a local neighborhood
reconstruction. Comparison experiments are conducted on both synthetic data
sets and real-world data sets to demonstrate the validity of the proposed new
NN algorithm and its improvements to other version of NN algorithms.
Given the widespread appearance of manifold structures in real-world problems
and the popularity of the traditional NN algorithm, the proposed manifold
version NN shows promising potential for classifying manifold-distributed
data.Comment: 32 pages, 12 figures, 7 table
Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches
Imaging spectrometers measure electromagnetic energy scattered in their
instantaneous field view in hundreds or thousands of spectral channels with
higher spectral resolution than multispectral cameras. Imaging spectrometers
are therefore often referred to as hyperspectral cameras (HSCs). Higher
spectral resolution enables material identification via spectroscopic analysis,
which facilitates countless applications that require identifying materials in
scenarios unsuitable for classical spectroscopic analysis. Due to low spatial
resolution of HSCs, microscopic material mixing, and multiple scattering,
spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus,
accurate estimation requires unmixing. Pixels are assumed to be mixtures of a
few materials, called endmembers. Unmixing involves estimating all or some of:
the number of endmembers, their spectral signatures, and their abundances at
each pixel. Unmixing is a challenging, ill-posed inverse problem because of
model inaccuracies, observation noise, environmental conditions, endmember
variability, and data set size. Researchers have devised and investigated many
models searching for robust, stable, tractable, and accurate unmixing
algorithms. This paper presents an overview of unmixing methods from the time
of Keshava and Mustard's unmixing tutorial [1] to the present. Mixing models
are first discussed. Signal-subspace, geometrical, statistical, sparsity-based,
and spatial-contextual unmixing algorithms are described. Mathematical problems
and potential solutions are described. Algorithm characteristics are
illustrated experimentally.Comment: This work has been accepted for publication in IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensin
Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways
Characterization of transient intermediate or transition states is crucial
for the description of biomolecular folding pathways, which is however
difficult in both experiments and computer simulations. Such transient states
are typically of low population in simulation samples. Even for simple systems
such as RNA hairpins, recently there are mounting debates over the existence of
multiple intermediate states. In this paper, we develop a computational
approach to explore the relatively low populated transition or intermediate
states in biomolecular folding pathways, based on a topological data analysis
tool, Mapper, with simulation data from large-scale distributed computing. The
method is inspired by the classical Morse theory in mathematics which
characterizes the topology of high dimensional shapes via some functional level
sets. In this paper we exploit a conditional density filter which enables us to
focus on the structures on pathways, followed by clustering analysis on its
level sets, which helps separate low populated intermediates from high
populated uninteresting structures. A successful application of this method is
given on a motivating example, a RNA hairpin with GCAA tetraloop, where we are
able to provide structural evidence from computer simulations on the multiple
intermediate states and exhibit different pictures about unfolding and
refolding pathways. The method is effective in dealing with high degree of
heterogeneity in distribution, capturing structural features in multiple
pathways, and being less sensitive to the distance metric than nonlinear
dimensionality reduction or geometric embedding methods. It provides us a
systematic tool to explore the low density intermediate states in complex
biomolecular folding systems.Comment: 23 pages, 6 figure
- …