9 research outputs found

    Efficient similarity search on multimedia databases

    Get PDF
    Manipulating and retrieving multimedia data has received increasing attention with the advent of cloud storage facilities. The ability of querying by similarity over large data collections is mandatory to improve storage and user interfaces. But, all of them are expensive operations to solve only in CPU; thus, it is convenient to take into account High Performance Computing (HPC) techniques in their solutions. The Graphics Processing Unit (GPU) as an alternative HPC device has been increasingly used to speedup certain computing processes. This work introduces a pure GPU architecture to build the Permutation Index and to solve approximate similarity queries on multimedia databases. The empirical results of each implementation have achieved different level of speedup which are related with characteristics of GPU and the particular database used.Eje: Workshop Bases de datos y minería de datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

    Efficient similarity search on multimedia databases

    Get PDF
    Manipulating and retrieving multimedia data has received increasing attention with the advent of cloud storage facilities. The ability of querying by similarity over large data collections is mandatory to improve storage and user interfaces. But, all of them are expensive operations to solve only in CPU; thus, it is convenient to take into account High Performance Computing (HPC) techniques in their solutions. The Graphics Processing Unit (GPU) as an alternative HPC device has been increasingly used to speedup certain computing processes. This work introduces a pure GPU architecture to build the Permutation Index and to solve approximate similarity queries on multimedia databases. The empirical results of each implementation have achieved different level of speedup which are related with characteristics of GPU and the particular database used.Eje: Workshop Bases de datos y minería de datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

    Indexing Metric Spaces for Exact Similarity Search

    Full text link
    With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

    Consultas sobre espacios métricos en paralelo

    Get PDF
    El trabajo desarrollado en esta tesis tuvo como objetivo el diseño, implementación y evaluación de un índice distribuido para objetos en espacios métricos y su respectiva estrategia de procesamiento paralelo de consultas para máquinas de búsqueda.Tesis doctoral de la Facultad de Ciencias Físicomatemáticas y Naturales (Universidad Nacional de San Luis). Grado alcanzado: Doctor en Ciencias de la Computación. Director de tesis: Martín Mauricio; co-director: Marcela Printista.Red de Universidades con Carreras en Informática (RedUNCI

    Métodos de mejora del rendimiento en búsquedas por similitud sobre espacios métricos

    Get PDF
    [Resumen] En esta tesis se abordan problemas de rendimiento de las búsquedas por similitud en espacios métricos. La búsqueda por similitud tiene como finalidad determinar los objetos más semejantes o cercanos a uno dado. Los espacios métricos permiten formalizar dicha búsqueda y han dado lugar a métodos, cuyo objetivo principal es reducir el número de evaluaciones de la función de distancia, intentando descartar el mayor número posible de objetos o de zonas que representan. Las soluciones existentes son métodos basados en pivotes, que obtienen un número reducido de evaluaciones pero requieren cantidades importantes de espacio, y métodos basados en particiones, que necesitan poco espacio pero que incrementan el número de evaluaciones. Las contribuciones de esta tesis son: i) un nuevo método basado en pivotes que reduce el tamaño del índice gracias a que almacena, para cada objeto, la distancia al pivote más prometedor para descartarlo, manteniendo un número de evaluaciones de la función de distancia que lo hacen competitivo con los métodos de particiones; y ii) una nueva estrategia para métodos basados en particiones que, reduciendo progresivamente el tamaño del cluster, disminuye significativamente el número de evaluaciones de la función de distancia, al explorar los clusters que no han sido descartados.[Resumo] Nesta tese abórdanse problemas de rendemento das procuras por similitude en espazos métricos. A procura por similitude ten como finalidade determinar os obxectos máis semellantes ou próximos a un dado. Os espazos métricos permiten formalizar dita procura e deron lugar a métodos, cuxo obxectivo principal é reducir o número de avaliacións da función de distancia, tentando descartar o maior número posible de obxectos ou de zonas que representan. As solucións existentes son métodos baseados en pivotes, que obteñen un número reducido de avaliacións pero requiren cantidades importantes de espazo, e métodos baseados en particións, que necesitan pouco espazo pero que incrementan o número de avaliacións. As contribucións desta tese son: i) un novo método baseado en pivotes que reduce o tamaño do índice grazas a que almacena, para cada obxecto, a distancia ao pivote máis prometedor para descartalo, mantendo un número de avaliacións da función de distancia que o fan competitivo cos métodos de particións; e ii) unha nova estratexia para métodos baseados en particións que, reducindo progresivamente o tamaño do cluster, diminúe moito o número de avaliacións da función de distancia, ao explorar os clusters que non foron descartados.[Abstract] In this thesis performance problems of similarity search in metric spaces are considered. The aim of the similarity search is to determine the most similar or closer objects to one given. The metric spaces allow to formalize this search and they have given rise to methods, whose main objective is to reduce the number of evaluations of the distance function, trying to discard the greater possible number of objects or of zones that they represent. The existing solutions are pivot-based methods, that obtain a reduced number of evaluations but require significant amounts of space; and clustering-based methods, that need little space but increase the number of evaluations. The contributions of this thesis are: i) a new pivot-based method that reduces the size of the index because it stores, for every object, the distance to the most promising pivot to discard it, maintaining a number of evaluations of the function of distance that make it competitive with clustering-based methods; and ii) a new strategy for clustering-based methods that, reducing progressively the size of the cluster, diminishes significantly the number of evaluations of the distance function when it explores the clusters that have not been discarded

    Resource Description and Selection for Similarity Search in Metric Spaces: Problems and Problem-Solving Approaches

    Get PDF
    In times of an ever increasing amount of data and a growing diversity of data types in different application contexts, there is a strong need for large-scale and flexible indexing and search techniques. Metric access methods (MAMs) provide this flexibility, because they only assume that the dissimilarity between two data objects is modeled by a distance metric. Furthermore, scalable solutions can be built with the help of distributed MAMs. Both IF4MI and RS4MI, which are presented in this thesis, represent metric access methods. IF4MI belongs to the group of centralized MAMs. It is based on an inverted file and thus offers a hybrid access method providing text retrieval capabilities in addition to content-based search in arbitrary metric spaces. In opposition to IF4MI, RS4MI is a distributed MAM based on resource description and selection techniques. Here, data objects are physically distributed. However, RS4MI is by no means restricted to a certain type of distributed information retrieval system. Various application fields for the resource description and selection techniques are possible, for example in the context of visual analytics. Due to the metric space assumption, possible application fields go far beyond content-based image retrieval applications which provide the example scenario here.Ständig zunehmende Datenmengen und eine immer größer werdende Vielfalt an Datentypen in verschiedenen Anwendungskontexten erfordern sowohl skalierbare als auch flexible Indexierungs- und Suchtechniken. Metrische Zugriffsstrukturen (MAMs: metric access methods) können diese Flexibilität bieten, weil sie lediglich unterstellen, dass die Distanz zwischen zwei Datenobjekten durch eine Distanzmetrik modelliert wird. Darüber hinaus lassen sich skalierbare Lösungen mit Hilfe verteilter MAMs entwickeln. Sowohl IF4MI als auch RS4MI, die beide in dieser Arbeit vorgestellt werden, stellen metrische Zugriffsstrukturen dar. IF4MI gehört zur Gruppe der zentralisierten MAMs. Diese Zugriffsstruktur basiert auf einer invertierten Liste und repräsentiert daher eine hybride Indexstruktur, die neben einer inhaltsbasierten Ähnlichkeitssuche in beliebigen metrischen Räumen direkt auch Möglichkeiten der Textsuche unterstützt. Im Gegensatz zu IF4MI handelt es sich bei RS4MI um eine verteilte MAM, die auf Techniken der Ressourcenbeschreibung und -auswahl beruht. Dabei sind die Datenobjekte physisch verteilt. RS4MI ist jedoch keineswegs auf die Anwendung in einem bestimmten verteilten Information-Retrieval-System beschränkt. Verschiedene Anwendungsfelder sind für die Techniken zur Ressourcenbeschreibung und -auswahl denkbar, zum Beispiel im Bereich der Visuellen Analyse. Dabei gehen Anwendungsmöglichkeiten weit über den für die Arbeit unterstellten Anwendungskontext der inhaltsbasierten Bildsuche hinaus
    corecore