Unsupervised Discovery of Co-occurrence in Sparse High Dimensional Data

Chum, Ondřej; Matas, Jiří

research

Unsupervised Discovery of Co-occurrence in Sparse High Dimensional Data

Authors: Ondřej Chum
Jiří Matas
Publication date: 1 June 2010
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

An efficient min-Hash based algorithm for discovery of dependencies in sparse high-dimensional data is presented. The dependencies are represented by sets of features co-occurring with high probability and are called co-ocsets. Sparse high dimensional descriptors, such as bag of words, have been proven very effective in the domain of image retrieval. To maintain high efficiency even for very large data collection, features are assumed independent. We show experimentally that co-ocsets are not rare, i.e. the independence assumption is often violated, and that they may ruin retrieval performance if present in the query image. Two methods for managing co-ocsets in such cases are proposed. Both methods significantly outperform the state-of-the-art in image retrieval, one is also significantly faster

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

Last time updated on 16/02/2019

Digital Library of the Czech Technical University in Prague

oai:dspace.cvut.cz:10467/9562

Last time updated on 08/11/2016