2,505 research outputs found

    Using Apache Lucene to Search Vector of Locally Aggregated Descriptors

    Full text link
    Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors.Comment: In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 4: VISAPP, p. 383-39

    Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application

    Full text link
    We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse entities in text. Experiments with several instantiations of these models show that: (i) our models perform on a par with two other well-known models of text coherence even without any parameter tuning, and (ii) reranking retrieval results according to their coherence scores gives notable performance gains, confirming a relation between document coherence and relevance. This work contributes two novel models of document coherence, the application of which to IR complements recent work in the integration of document cohesiveness or comprehensibility to ranking [5, 56]

    Efficient Nearest Neighbors Search for Large-Scale Landmark Recognition

    Full text link
    The problem of landmark recognition has achieved excellent results in small-scale datasets. When dealing with large-scale retrieval, issues that were irrelevant with small amount of data, quickly become fundamental for an efficient retrieval phase. In particular, computational time needs to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. In this paper we propose a novel multi-index hashing method called Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows to drastically reduce the query time and outperforms the accuracy results compared to the state-of-the-art methods for large-scale landmark recognition. It has been demonstrated that this family of algorithms can be applied on different embedding techniques like VLAD and R-MAC obtaining excellent results in very short times on different public datasets: Holidays+Flickr1M, Oxford105k and Paris106k

    List of Clustered Permutations in Secondary Memory

    Get PDF
    Similarity search is a difficult problem and various indexing schemas have been defined to process similarity queries efficiently in many applications, including multimedia databases and other repositories handling complex objects. Metric indices support efficient similarity searches, but most of them are designed for main memory. Thus, they can handle only small datasets, suffering serious performance degradations when the objects reside on disk.Most real-life database applications require indices able to work on secondary memory. Among a plethora of indices, the List of Clustered Permutations (LCP) has shown to be competitive in main memory, since groups the permutations and establishes a criterion to discard whole clusters according the permutation of their centers. We introduce a secondary-memory variant of the LCP, which maintains the low number of distance evaluations when comparing the permutations themselves, and also needs a low number of I/O operations at construction and searching.XII Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

    Re-ranking Permutation-Based Candidate Sets with the n-Simplex Projection

    Get PDF
    In the realm of metric search, the permutation-based approaches have shown very good performance in indexing and supporting approximate search on large databases. These methods embed the metric objects into a permutation space where candidate results to a given query can be efficiently identified. Typically, to achieve high effectiveness, the permutation-based result set is refined by directly comparing each candidate object to the query one. Therefore, one drawback of these approaches is that the original dataset needs to be stored and then accessed during the refining step. We propose a refining approach based on a metric embedding, called n-Simplex projection, that can be used on metric spaces meeting the n-point property. The n-Simplex projection provides upper- and lower-bounds of the actual distance, derived using the distances between the data objects and a finite set of pivots. We propose to reuse the distances computed for building the data permutations to derive these bounds and we show how to use them to improve the permutation-based results. Our approach is particularly advantageous for all the cases in which the traditional refining step is too costly, e.g. very large dataset or very expensive metric function

    Pattern vectors from algebraic graph theory

    Get PDF
    Graphstructures have proven computationally cumbersome for pattern analysis. The reason for this is that, before graphs can be converted to pattern vectors, correspondences must be established between the nodes of structures which are potentially of different size. To overcome this problem, in this paper, we turn to the spectral decomposition of the Laplacian matrix. We show how the elements of the spectral matrix for the Laplacian can be used to construct symmetric polynomials that are permutation invariants. The coefficients of these polynomials can be used as graph features which can be encoded in a vectorial manner. We extend this representation to graphs in which there are unary attributes on the nodes and binary attributes on the edges by using the spectral decomposition of a Hermitian property matrix that can be viewed as a complex analogue of the Laplacian. To embed the graphs in a pattern space, we explore whether the vectors of invariants can be embedded in a low- dimensional space using a number of alternative strategies, including principal components analysis ( PCA), multidimensional scaling ( MDS), and locality preserving projection ( LPP). Experimentally, we demonstrate that the embeddings result in well- defined graph clusters. Our experiments with the spectral representation involve both synthetic and real- world data. The experiments with synthetic data demonstrate that the distances between spectral feature vectors can be used to discriminate between graphs on the basis of their structure. The real- world experiments show that the method can be used to locate clusters of graphs

    List of clustered permutations in secondary memory for proximity searching

    Get PDF
    Similarity search is a difficult problem and various indexing schemas have been defined to process similarity queries efficiently in many applications, including multimedia databases and other repositories handling complex objects. Metric indices support efficient similarity searches, but most of them are designed for main memory. Thus, they can handle only small datasets, suffering serious performance degradations when the objects reside on disk. Most reallife database applications require indices able to work on secondary memory. Among a plethora of indices, the List of Clustered Permutations (LCP) has shown to be competitive in main memory.We introduce a secondary-memory variant of the LCP, which maintains the low number of distance evaluations when comparing the permutations themselves, and also needs a low number of I/O operations at construction and searching.Facultad de Informátic

    List of clustered permutations in secondary memory for proximity searching

    Get PDF
    Similarity search is a difficult problem and various indexing schemas have been defined to process similarity queries efficiently in many applications, including multimedia databases and other repositories handling complex objects. Metric indices support efficient similarity searches, but most of them are designed for main memory. Thus, they can handle only small datasets, suffering serious performance degradations when the objects reside on disk. Most reallife database applications require indices able to work on secondary memory. Among a plethora of indices, the List of Clustered Permutations (LCP) has shown to be competitive in main memory.We introduce a secondary-memory variant of the LCP, which maintains the low number of distance evaluations when comparing the permutations themselves, and also needs a low number of I/O operations at construction and searching.Facultad de Informátic
    • …
    corecore