32 research outputs found

    Preface

    Get PDF
    7th International Conference on Similarity Search and Applications (SISAP).\ud Los Cabos, MĂ©xico. 29-31 october 2014

    Improving metric access methods with bucket files

    Get PDF
    Modern applications deal with complex data, where retrieval by similarity plays an important role in most of them. Complex data whose primary comparison mechanisms are similarity predicates are usually immersed in metric spaces. Metric Access Methods (MAMs) exploit the metric space properties to divide the metric space into regions and conquer efficiency on the processing of similarity queries, like range and k-nearest neighbor queries. \ud Existing MAM use homogeneous data structures to improve query execution, pursuing the same techniques employed by traditional methods developed to retrieve scalar and multidimensional data. In this paper, we combine hashing and hierarchical ball partitioning approaches to achieve a hybrid index that is tuned to improve similarity queries targeting complex data sets, with search algorithms that reduce total execution time by aggressively reducing the number of distance calculations. We applied our technique in the Slim-tree and performed experiments over real data sets showing that the proposed technique is able to reduce the execution time of both range and k-nearest queries to at least half of the Slim-tree. Moreover, this technique is general to be applied over many existing MAM.CAPESCNPqFAPESPInternational Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

    Compact distance histogram: a novel structure to boost k-nearest neighbor queries

    Get PDF
    The k-Nearest Neighbor query (k-NNq) is one of the most useful similarity queries. Elaborated k-NNq algorithms depend on an initial radius to prune regions of the search space that cannot contribute to the answer. Therefore, estimating a suitable starting radius is of major importance to accelerate k-NNq execution. This paper presents a new technique to estimate a tight initial radius. Our approach, named CDH-kNN, relies on Compact Distance Histograms (CDHs), which are pivot-based histograms defined as piecewise linear functions. Such structures approximate the distance distribution and are compressed according to a given constraint, which can be a desired number of buckets and/or a maximum allowed error. The covering radius of a k-NNq is estimated based on the relationship between the query element and the CDHs' joint frequencies. The paper presents a complete specification of CDH-kNN, including CDH's construction and radii estimation. Extensive experiments on both real and synthetic datasets highlighted the efficiency of our approach, showing that it was up to 72% faster than existing algorithms, outperforming every competitor in all the setups evaluated. In fact, the experiments showed that our proposal was just 20% slower than the theoretical lower bound.FAPESPCNPqCapesSticAMSU

    Bibliotecas digitais: a experiĂȘncia da USP

    Get PDF

    Combining diversity queries and visual mining to improve content-based image retrieval systems: the DiVI method

    Get PDF
    This paper proposes a new approach to improve similarity queries with diversity, the Diversity and Visually-Interactive method (DiVI), which employs Visual Data Mining techniques in Content-Based Image Retrieval (CBIR) systems. DiVI empowers the user to understand how the measures of similarity and diversity affect their queries, as well as increases the relevance of CBIR results according to the user judgment. An overview of the image distribution in the database is shown to the user through multidimensional projection. The user interacts with the visual representation changing the projected space or the query parameters, according to his/her needs and previous knowledge. DiVI takes advantage of the users’ activity to transparently reduce the semantic gap faced by CBIR systems. Empirical evaluation show that DiVI increases the precision for querying by content and also increases the applicability and acceptance of similarity with diversity in CBIR systems.FAPESPCNPqCAPESRescuer Project (European Commission Grant 614154 and CNPq/MCTI Grant 490084/2013-3

    Diversity in similarity joins

    Get PDF
    With the increasing ability of current applications to produce and consume more complex data, such as images and geographic information, the similarity join has attracted considerable attention. However, this operator does not consider the relationship among the elements in the answer, generating results with many pairs similar among themselves, which does not add value to the final answer. Result diversification methods are intended to retrieve elements similar enough to satisfy the similarity conditions, but also considering the diversity among the elements in the answer, producing a more heterogeneous result with smaller cardinality, which improves the meaning of the answer. Still, diversity have been studied only when applied to unary operations. In this paper, we introduce the concept of diverse similarity joins: a similarity join operator that ensures a smaller, more diversified and useful answers. The experiments performed on real and synthetic datasets show that our proposal allows exploiting diversity in similarity joins without diminish their performance whereas providing elements that cover the same data space distribution of the non-diverse answers.FAPESPCNPQCAPESRescuer (EU Commission Grant 614154 and CNPQ/MCTI Grant 490084/2013-3)International Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

    SHRuB: searching through heuristics for the better query-execution plan

    Get PDF
    An important aspect to be considered for systems aiming at integrating similarity-queries into RDBMS is how to represent and optimize query-plans that involve traditional and complex predicates. Toward developing facilities for such integration, we developed a technique to extract a canonical queryplan command tree from an similarity-extended SQL expression. The SHRuB tool, presented in this paper, is able to interactively represent a query parsetree. We developed a catalog model which allows estimating the execution cost as well as provides hints for optimizing the query-plan by adopting a three stage heuristic. Through a case study and initial experiments, we have demonstrated that the tool is able to find a local-minimum query-execution plan. Moreover, SHRuB can be plugged on existing frameworks that support similarity queries or employed as a course-ware aid for database teaching.FAPESPCNPqCAPE

    Have you met VikS? A novel framework for visual diversity search analysis

    Get PDF
    Searching images based on their pictorial content, or content-based image retrieval (CBIR), instead of using traditional tags and labeling attached to them has attracted considerable attention. However, this retrieval by contente may often retrieve images too similar among themselves. Considering a diversity factor has been a way of improving the quality of results retrieved by user queries. There are still questions about how this factor is used in the searches. In this paper, we present VikS, a CBIR system that answers queries based on the similarity and diversity paradigms and supports visual data mining techniques, becoming the user an active agent in the query process and enhancing the understanding of the impact of the diversity in the k-nearest neighbor queries. This framework provides implementations of a wide suite of algorithms to compute and compare diverse results. Users can tune diversification parameters, combine similarity with diversity and see how diverse results are in a projecting space which highlight the distance distribution of the elements.FAPESPCNPqCapesRESCUERSticAMSU
    corecore