Search CORE

32 research outputs found

Preface

Author: Cordeiro Robson Leonardo Ferreira
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Cham
Publication date
Field of study

7th International Conference on Similarity Search and Applications (SISAP).\ud Los Cabos, México. 29-31 october 2014

Universidade de São Paulo

Improving metric access methods with bucket files

Author: Kaster Daniel S.
Pola Ives R. V.
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Cham
Publication date: 01/01/2015
Field of study

Modern applications deal with complex data, where retrieval by similarity plays an important role in most of them. Complex data whose primary comparison mechanisms are similarity predicates are usually immersed in metric spaces. Metric Access Methods (MAMs) exploit the metric space properties to divide the metric space into regions and conquer efficiency on the processing of similarity queries, like range and k-nearest neighbor queries. \ud Existing MAM use homogeneous data structures to improve query execution, pursuing the same techniques employed by traditional methods developed to retrieve scalar and multidimensional data. In this paper, we combine hashing and hierarchical ball partitioning approaches to achieve a hybrid index that is tuned to improve similarity queries targeting complex data sets, with search algorithms that reduce total execution time by aggressively reducing the number of distance calculations. We applied our technique in the Slim-tree and performed experiments over real data sets showing that the proposed technique is able to reduce the execution time of both range and k-nearest queries to at least half of the Slim-tree. Moreover, this technique is general to be applied over many existing MAM.CAPESCNPqFAPESPInternational Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

Compact distance histogram: a novel structure to boost k-nearest neighbor queries

Author: Bêdo Marcos Vinícius Naves
Kaster Daniel S.
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: La Jolla
Publication date
Field of study

The k-Nearest Neighbor query (k-NNq) is one of the most useful similarity queries. Elaborated k-NNq algorithms depend on an initial radius to prune regions of the search space that cannot contribute to the answer. Therefore, estimating a suitable starting radius is of major importance to accelerate k-NNq execution. This paper presents a new technique to estimate a tight initial radius. Our approach, named CDH-kNN, relies on Compact Distance Histograms (CDHs), which are pivot-based histograms defined as piecewise linear functions. Such structures approximate the distance distribution and are compressed according to a given constraint, which can be a desired number of buckets and/or a maximum allowed error. The covering radius of a k-NNq is estimated based on the relationship between the query element and the CDHs' joint frequencies. The paper presents a complete specification of CDH-kNN, including CDH's construction and radii estimation. Extensive experiments on both real and synthetic datasets highlighted the efficiency of our approach, showing that it was up to 72% faster than existing algorithms, outperforming every competitor in all the setups evaluated. In fact, the experiments showed that our proposal was just 20% slower than the theoretical lower bound.FAPESPCNPqCapesSticAMSU

Bibliotecas digitais: a experiência da USP

Author: Kondo Rogerio Toshiaki
Lirani Maria de Lourdes Rebucci
Traina Junior Caetano
Publication venue: Universidade de São Paulo. Superintendência de Comunicação Social
Publication date: 01/02/2009
Field of study

Cadernos Espinosanos (E-Journal)

Combining diversity queries and visual mining to improve content-based image retrieval systems: the DiVI method

Author: Dias Rafael L.
Junior Caetano Traina
Ribeiro Marcela X.
Santos Lúcio Fernandes Dutra
Traina Agma Juci Machado
Publication venue: Miami
Publication date: 01/12/2015
Field of study

This paper proposes a new approach to improve similarity queries with diversity, the Diversity and Visually-Interactive method (DiVI), which employs Visual Data Mining techniques in Content-Based Image Retrieval (CBIR) systems. DiVI empowers the user to understand how the measures of similarity and diversity affect their queries, as well as increases the relevance of CBIR results according to the user judgment. An overview of the image distribution in the database is shown to the user through multidimensional projection. The user interacts with the visual representation changing the projected space or the query parameters, according to his/her needs and previous knowledge. DiVI takes advantage of the users’ activity to transparently reduce the semantic gap faced by CBIR systems. Empirical evaluation show that DiVI increases the precision for querying by content and also increases the applicability and acceptance of similarity with diversity in CBIR systems.FAPESPCNPqCAPESRescuer Project (European Commission Grant 614154 and CNPq/MCTI Grant 490084/2013-3

Crossref

Universidade de São Paulo

Proceedings of the IEEE 28th International Symposium on Computer-Based Medical Systems

Author: Kane Bridget
Marques Paulo Mazzoncini de Azevedo
Rodrigues Pedro Pereira
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Los Alamitos
Publication date
Field of study

Preface

Author: Kane Bridget
Marques Paulo Mazzoncini de Azevedo
Rodrigues Pedro Pereira
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Los Alamitos
Publication date
Field of study

Diversity in similarity joins

Author: Carvalho Luiz Olmes
Oliveira Willian Dener de
Santos Lúcio Fernandes Dutra
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Cham
Publication date
Field of study

With the increasing ability of current applications to produce and consume more complex data, such as images and geographic information, the similarity join has attracted considerable attention. However, this operator does not consider the relationship among the elements in the answer, generating results with many pairs similar among themselves, which does not add value to the final answer. Result diversification methods are intended to retrieve elements similar enough to satisfy the similarity conditions, but also considering the diversity among the elements in the answer, producing a more heterogeneous result with smaller cardinality, which improves the meaning of the answer. Still, diversity have been studied only when applied to unary operations. In this paper, we introduce the concept of diverse similarity joins: a similarity join operator that ensures a smaller, more diversified and useful answers. The experiments performed on real and synthetic datasets show that our proposal allows exploiting diversity in similarity joins without diminish their performance whereas providing elements that cover the same data space distribution of the non-diverse answers.FAPESPCNPQCAPESRescuer (EU Commission Grant 614154 and CNPQ/MCTI Grant 490084/2013-3)International Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

SHRuB: searching through heuristics for the better query-execution plan

Author: Bêdo Marcos Vinícius Naves
Olmes-Carvalho Luiz
Pierro Gabriel Vicente de
Traina Junior Caetano
Publication venue: Curitiba
Publication date
Field of study

An important aspect to be considered for systems aiming at integrating similarity-queries into RDBMS is how to represent and optimize query-plans that involve traditional and complex predicates. Toward developing facilities for such integration, we developed a technique to extract a canonical queryplan command tree from an similarity-extended SQL expression. The SHRuB tool, presented in this paper, is able to interactively represent a query parsetree. We developed a catalog model which allows estimating the execution cost as well as provides hints for optimizing the query-plan by adopting a three stage heuristic. Through a case study and initial experiments, we have demonstrated that the tool is able to find a local-minimum query-execution plan. Moreover, SHRuB can be plugged on existing frameworks that support similarity queries or employed as a course-ware aid for database teaching.FAPESPCNPqCAPE

Universidade de São Paulo

Have you met VikS? A novel framework for visual diversity search analysis

Author: Dias Rafael L.
Ferreira Mônica Ribeiro Porto
Ribeiro Marcela X.
Santos Lúcio Fernandes Dutra
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Curitiba
Publication date
Field of study

Searching images based on their pictorial content, or content-based image retrieval (CBIR), instead of using traditional tags and labeling attached to them has attracted considerable attention. However, this retrieval by contente may often retrieve images too similar among themselves. Considering a diversity factor has been a way of improving the quality of results retrieved by user queries. There are still questions about how this factor is used in the searches. In this paper, we present VikS, a CBIR system that answers queries based on the similarity and diversity paradigms and supports visual data mining techniques, becoming the user an active agent in the query process and enhancing the understanding of the impact of the diversity in the k-nearest neighbor queries. This framework provides implementations of a wide suite of algorithms to compute and compare diverse results. Users can tune diversification parameters, combine similarity with diversity and see how diverse results are in a projecting space which highlight the distance distribution of the elements.FAPESPCNPqCapesRESCUERSticAMSU