65,706 research outputs found

    Sign Stable Projections, Sign Cauchy Projections and Chi-Square Kernels

    Full text link
    The method of stable random projections is popular for efficiently computing the Lp distances in high dimension (where 0<p<=2), using small space. Because it adopts nonadaptive linear projections, this method is naturally suitable when the data are collected in a dynamic streaming fashion (i.e., turnstile data streams). In this paper, we propose to use only the signs of the projected data and analyze the probability of collision (i.e., when the two signs differ). We derive a bound of the collision probability which is exact when p=2 and becomes less sharp when p moves away from 2. Interestingly, when p=1 (i.e., Cauchy random projections), we show that the probability of collision can be accurately approximated as functions of the chi-square similarity. For example, when the (un-normalized) data are binary, the maximum approximation error of the collision probability is smaller than 0.0192. In text and vision applications, the chi-square similarity is a popular measure for nonnegative data when the features are generated from histograms. Our experiments confirm that the proposed method is promising for large-scale learning applications

    Sign-Full Random Projections

    Full text link
    The method of 1-bit ("sign-sign") random projections has been a popular tool for efficient search and machine learning on large datasets. Given two DD-dim data vectors uu, vāˆˆRDv\in\mathbb{R}^D, one can generate x=āˆ‘i=1Duirix = \sum_{i=1}^D u_i r_i, and y=āˆ‘i=1Dviriy = \sum_{i=1}^D v_i r_i, where riāˆ¼N(0,1)r_i\sim N(0,1) iid. The "collision probability" is Pr(sgn(x)=sgn(y))=1āˆ’cosā”āˆ’1ĻĻ€{Pr}\left(sgn(x)=sgn(y)\right) = 1-\frac{\cos^{-1}\rho}{\pi}, where Ļ=Ļ(u,v)\rho = \rho(u,v) is the cosine similarity. We develop "sign-full" random projections by estimating Ļ\rho from (e.g.,) the expectation E(sgn(x)y)=2Ļ€ĻE(sgn(x)y)=\sqrt{\frac{2}{\pi}} \rho, which can be further substantially improved by normalizing yy. For nonnegative data, we recommend an interesting estimator based on E(yāˆ’1xā‰„0+y+1x<0)E\left(y_- 1_{x\geq 0} + y_+ 1_{x<0}\right) and its normalized version. The recommended estimator almost matches the accuracy of the (computationally expensive) maximum likelihood estimator. At high similarity (Ļā†’1\rho\rightarrow1), the asymptotic variance of recommended estimator is only 43Ļ€ā‰ˆ0.4\frac{4}{3\pi} \approx 0.4 of the estimator for sign-sign projections. At small kk and high similarity, the improvement would be even much more substantial

    Optical M0bius Strips in Three Dimensional Ellipse Fields: Lines of Linear Polarization

    Full text link
    The minor axes of, and the normals to, the polarization ellipses that surround singular lines of linear polarization in three dimensional optical ellipse fields are shown to be organized into Mobius strips and into structures we call rippled rings (r-rings). The Mobius strips have two full twists, and can be either right- or left-handed. The major axes of the surrounding ellipses generate cone-like structures. Three orthogonal projections that give rise to 15 indices are used to characterize the different structures. These indices, if independent, could generate 839,808 geometrically and topologically distinct lines; selection rules are presented that reduce the number of lines to 8,248, some 5,562 of which have been observed in a computer simulation. Statistical probabilities are presented for the most important index combinations in random fields. It is argued that it is presently feasible to perform experimental measurements of the Mobius strips, r-rings, and cones described here theoretically

    Optical Mobius Strips in Three Dimensional Ellipse Fields: Lines of Circular Polarization

    Full text link
    The major and minor axes of the polarization ellipses that surround singular lines of circular polarization in three dimensional optical ellipse fields are shown to be organized into Mobius strips. These strips can have either one or three half-twists, and can be either right- or left-handed. The normals to the surrounding ellipses generate cone-like structures. Two special projections, one new geometrical, and seven new topological indices are developed to characterize the rather complex structures of the Mobius strips and cones. These eight indices, together with the two well-known indices used until now to characterize singular lines of circular polarization, could, if independent, generate 16,384 geometrically and topologically distinct lines. Geometric constraints and 13 selection rules are discussed that reduce the number of lines to 2,104, some 1,150 of which have been observed in practice; this number of different C lines is ~ 350 times greater than the three types of lines recognized previously. Statistical probabilities are presented for the most important index combinations in random fields. It is argued that it is presently feasible to perform experimental measurements of the Mobius strips and cones described here theoretically

    Sharp generalization error bounds for randomly-projected classifiers

    Get PDF
    We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomly projected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the question of interest is really ā€˜what is the effect of random projection on the predicted class labels?ā€™ and we therefore derive the exact probability of ā€˜label flippingā€™ under Gaussian random projection in order to quantify this effect precisely in our bounds
    • ā€¦
    corecore