41,396 research outputs found
Minimum Average Deviance Estimation for Sufficient Dimension Reduction
Sufficient dimension reduction reduces the dimensionality of data while
preserving relevant regression information. In this article, we develop Minimum
Average Deviance Estimation (MADE) methodology for sufficient dimension
reduction. It extends the Minimum Average Variance Estimation (MAVE) approach
of Xia et al. (2002) from continuous responses to exponential family
distributions to include Binomial and Poisson responses. Local likelihood
regression is used to learn the form of the regression function from the data.
The main parameter of interest is a dimension reduction subspace which projects
the covariates to a lower dimension while preserving their relationship with
the outcome. To estimate this parameter within its natural space, we consider
an iterative algorithm where one step utilizes a Stiefel manifold optimizer. We
empirically evaluate the performance of three prediction methods, two that are
intrinsic to local likelihood estimation and one that is based on the
Nadaraya-Watson estimator. Initial results show that, as expected, MADE can
outperform MAVE when there is a departure from the assumption of additive
errors
Locality Preserving Projections for Grassmann manifold
Learning on Grassmann manifold has become popular in many computer vision
tasks, with the strong capability to extract discriminative information for
imagesets and videos. However, such learning algorithms particularly on
high-dimensional Grassmann manifold always involve with significantly high
computational cost, which seriously limits the applicability of learning on
Grassmann manifold in more wide areas. In this research, we propose an
unsupervised dimensionality reduction algorithm on Grassmann manifold based on
the Locality Preserving Projections (LPP) criterion. LPP is a commonly used
dimensionality reduction algorithm for vector-valued data, aiming to preserve
local structure of data in the dimension-reduced space. The strategy is to
construct a mapping from higher dimensional Grassmann manifold into the one in
a relative low-dimensional with more discriminative capability. The proposed
method can be optimized as a basic eigenvalue problem. The performance of our
proposed method is assessed on several classification and clustering tasks and
the experimental results show its clear advantages over other Grassmann based
algorithms.Comment: Accepted by IJCAI 201
Recommended from our members
Effectiveness of landmark analysis for establishing locality in p2p networks
Locality to other nodes on a peer-to-peer overlay network can be established by means of a set of landmarks shared among the participating nodes. Each node independently collects a set of latency measures to landmark nodes, which are used as a multi-dimensional feature vector. Each peer node uses the feature vector to generate a unique scalar index which is correlated to its topological locality. A popular dimensionality reduction technique is the space filling Hilbert’s curve, as it possesses good locality
preserving properties. However, there exists little comparison between Hilbert’s curve and other techniques for dimensionality reduction. This work carries out a quantitative analysis of their properties. Linear and non-linear techniques for scaling the landmark vectors to a single dimension are investigated. Hilbert’s curve, Sammon’s mapping and Principal Component Analysis
have been used to generate a 1d space with locality preserving properties. This work provides empirical evidence to support the use of Hilbert’s curve in the context of locality preservation when generating peer identifiers by means of landmark vector analysis. A comparative analysis is carried out with an artificial 2d network model and with a realistic network topology model
with a typical power-law distribution of node connectivity in the Internet. Nearest neighbour analysis confirms Hilbert’s curve to be very effective in both artificial and realistic network topologies. Nevertheless, the results in the realistic network model show that there is scope for improvements and better techniques to preserve locality information are required
A Process for Topic Modelling Via Word Embeddings
This work combines algorithms based on word embeddings, dimensionality
reduction, and clustering. The objective is to obtain topics from a set of
unclassified texts. The algorithm to obtain the word embeddings is the BERT
model, a neural network architecture widely used in NLP tasks. Due to the high
dimensionality, a dimensionality reduction technique called UMAP is used. This
method manages to reduce the dimensions while preserving part of the local and
global information of the original data. K-Means is used as the clustering
algorithm to obtain the topics. Then, the topics are evaluated using the TF-IDF
statistics, Topic Diversity, and Topic Coherence to get the meaning of the
words on the clusters. The results of the process show good values, so the
topic modeling of this process is a viable option for classifying or clustering
texts without labels
- …