65,706 research outputs found
Sign Stable Projections, Sign Cauchy Projections and Chi-Square Kernels
The method of stable random projections is popular for efficiently computing
the Lp distances in high dimension (where 0<p<=2), using small space. Because
it adopts nonadaptive linear projections, this method is naturally suitable
when the data are collected in a dynamic streaming fashion (i.e., turnstile
data streams). In this paper, we propose to use only the signs of the projected
data and analyze the probability of collision (i.e., when the two signs
differ). We derive a bound of the collision probability which is exact when p=2
and becomes less sharp when p moves away from 2. Interestingly, when p=1 (i.e.,
Cauchy random projections), we show that the probability of collision can be
accurately approximated as functions of the chi-square similarity. For example,
when the (un-normalized) data are binary, the maximum approximation error of
the collision probability is smaller than 0.0192. In text and vision
applications, the chi-square similarity is a popular measure for nonnegative
data when the features are generated from histograms. Our experiments confirm
that the proposed method is promising for large-scale learning applications
Sign-Full Random Projections
The method of 1-bit ("sign-sign") random projections has been a popular tool
for efficient search and machine learning on large datasets. Given two -dim
data vectors , , one can generate , and , where iid. The
"collision probability" is , where is the cosine
similarity.
We develop "sign-full" random projections by estimating from (e.g.,)
the expectation , which can be further
substantially improved by normalizing . For nonnegative data, we recommend
an interesting estimator based on
and its normalized version. The recommended estimator almost matches the
accuracy of the (computationally expensive) maximum likelihood estimator. At
high similarity (), the asymptotic variance of recommended
estimator is only of the estimator for sign-sign
projections. At small and high similarity, the improvement would be even
much more substantial
Optical M0bius Strips in Three Dimensional Ellipse Fields: Lines of Linear Polarization
The minor axes of, and the normals to, the polarization ellipses that
surround singular lines of linear polarization in three dimensional optical
ellipse fields are shown to be organized into Mobius strips and into structures
we call rippled rings (r-rings). The Mobius strips have two full twists, and
can be either right- or left-handed. The major axes of the surrounding ellipses
generate cone-like structures. Three orthogonal projections that give rise to
15 indices are used to characterize the different structures. These indices, if
independent, could generate 839,808 geometrically and topologically distinct
lines; selection rules are presented that reduce the number of lines to 8,248,
some 5,562 of which have been observed in a computer simulation. Statistical
probabilities are presented for the most important index combinations in random
fields. It is argued that it is presently feasible to perform experimental
measurements of the Mobius strips, r-rings, and cones described here
theoretically
Optical Mobius Strips in Three Dimensional Ellipse Fields: Lines of Circular Polarization
The major and minor axes of the polarization ellipses that surround singular
lines of circular polarization in three dimensional optical ellipse fields are
shown to be organized into Mobius strips. These strips can have either one or
three half-twists, and can be either right- or left-handed. The normals to the
surrounding ellipses generate cone-like structures. Two special projections,
one new geometrical, and seven new topological indices are developed to
characterize the rather complex structures of the Mobius strips and cones.
These eight indices, together with the two well-known indices used until now to
characterize singular lines of circular polarization, could, if independent,
generate 16,384 geometrically and topologically distinct lines. Geometric
constraints and 13 selection rules are discussed that reduce the number of
lines to 2,104, some 1,150 of which have been observed in practice; this number
of different C lines is ~ 350 times greater than the three types of lines
recognized previously. Statistical probabilities are presented for the most
important index combinations in random fields. It is argued that it is
presently feasible to perform experimental measurements of the Mobius strips
and cones described here theoretically
Sharp generalization error bounds for randomly-projected classifiers
We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomly projected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the question of interest is really āwhat is the effect of random projection on the predicted class labels?ā and we therefore derive the exact probability of ālabel flippingā under Gaussian random projection in order to quantify this effect precisely in our bounds
- ā¦