8,377 research outputs found
Locality-Sensitive Hashing with Margin Based Feature Selection
We propose a learning method with feature selection for Locality-Sensitive
Hashing. Locality-Sensitive Hashing converts feature vectors into bit arrays.
These bit arrays can be used to perform similarity searches and personal
authentication. The proposed method uses bit arrays longer than those used in
the end for similarity and other searches and by learning selects the bits that
will be used. We demonstrated this method can effectively perform optimization
for cases such as fingerprint images with a large number of labels and
extremely few data that share the same labels, as well as verifying that it is
also effective for natural images, handwritten digits, and speech features.Comment: 9 pages, 6 figures, 3 table
Cache-Oblivious Selection in Sorted X+Y Matrices
Let X[0..n-1] and Y[0..m-1] be two sorted arrays, and define the mxn matrix A
by A[j][i]=X[i]+Y[j]. Frederickson and Johnson gave an efficient algorithm for
selecting the k-th smallest element from A. We show how to make this algorithm
IO-efficient. Our cache-oblivious algorithm performs O((m+n)/B) IOs, where B is
the block size of memory transfers
Assessing similarity of feature selection techniques in high-dimensional domains
Recent research efforts attempt to combine multiple feature selection techniques instead of using a single one. However, this combination is often made on an “ad hoc” basis, depending on the specific problem at hand, without considering the degree of diversity/similarity of the involved methods. Moreover, though it is recognized that different techniques may return quite dissimilar outputs, especially in high dimensional/small sample size domains, few direct comparisons exist that quantify these differences and their implications on classification performance. This paper aims to provide a contribution in this direction by proposing a general methodology for assessing the similarity between the outputs of different feature selection methods in high dimensional classification problems. Using as benchmark the genomics domain, an empirical study has been conducted to compare some of the most popular feature selection methods, and useful insight has been obtained about their pattern of agreement
- …