781 research outputs found

    Multimodal non-linear latent semantic method for information retrieval

    Get PDF
    La búsqueda y recuperación de datos multimodales es una importante tarea dentro del campo de búsqueda y recuperación de información, donde las consultas y los elementos de la base de datos objetivo están representados por un conjunto de modalidades, donde cada una de ellas captura un aspecto de un fenómeno de interés. Cada modalidad contiene información complementaria y común a otras modalidades. Con el fin de tomar ventaja de la información adicional distribuida a través de las distintas modalidades han sido desarrollados muchos algoritmos y métodos que utilizan las propiedades estadísticas en los datos multimodales para encontrar correlaciones implícitas, otros aprenden a calcular distancias heterogéneas, otros métodos aprenden a proyectar los datos desde el espacio de entrada hasta un espacio semántico común, donde las diferentes modalidades son comparables y se puede construir un ranking a partir de ellas. En esta tesis se presenta el diseño de un sistema para la búsqueda y recuperación de información multimodal que aprende varias proyecciones no lineales a espacios semánticos latentes donde las distintas modalidades son representadas en conjunto y es posible realizar comparaciones y medidas de similitud para construir rankings multimodales. Adicionalmente se propone un método kernelizado para la proyección de datos a un espacio semántico latente usando la información de las etiquetas como método de supervisión para construir índice multimodal que integra los datos multimodales y la información de las etiquetas; este método puede proyectar los datos a tres diferentes espacios semánticos donde varias configuraciones de búsqueda y recuperación de información pueden ser aplicadas. El sistema y el método propuestos fueron evaluados en un conjunto de datos compuesto por casos médicos, donde cada caso consta de una imagen de tejido prostático, un reporte de texto del patólogo y un valor de Gleason score como etiqueta de supervisión. Combinando la información multimodal y la información en las etiquetas se generó un índice multimodal que se utilizó para realizar la tarea de búsqueda y recuperación de información por contenido obteniendo resultados sobresalientes. Las proyecciones no-lineales permiten al modelo una mayor flexibilidad y capacidad de representación. Sin embargo calcular estas proyecciones no-lineales en un conjunto de datos enorme es computacionalmente costoso, para reducir este costo y habilitar el modelo para procesar datos a gran escala, la técnica del budget fue utilizada, mostrando un buen compromiso entre efectividad y velocidad.Multimodal information retrieval is an information retrieval sub-task where queries and database target elements are composed of several modalities or views. A modality is a representation of complex phenomena, captured and measured by different sensors or information sources, each one encodes some information about it. Each modality representation contains complementary and shared information about the phenomenon of interest, this additional information can be used to improve the information retrieval process. Several methods have been developed to take advantage of additional information distributed across different modalities. Some of them exploit statistical properties in multimodal data to find correlations and implicit relationships, others learn heterogeneous distance functions, and others learn linear and non-linear projections that transform data from the original input space to a common latent semantic space where different modalities are comparable. In spite of the attention dedicated to this issue, multimodal information retrieval is still an open problem. This thesis presents a multimodal information retrieval system designed to learn several mapping functions to transform multimodal data to a latent semantic space, where different modalities are combined and can be compared to build a multimodal ranking and perform a multimodal information retrieval task. Additionally, a multimodal kernelized latent semantic embedding method is proposed to construct a supervised multimodal index, integrating multimodal data and label supervision. This method can perform mappings to three different spaces where some information retrieval task setups can be performed. The proposed system and method were evaluated in a multimodal medical case-based retrieval task where data is composed of whole-slide images of prostate tissue samples, pathologist’s text report and Gleason score as a supervised label. Multimodal data and labels were combined to produce a multimodal index. This index was used to retrieve multimodal information and achieves outstanding results compared with previous works on this topic. Non-linear mappings provide more flexibility and representation capacity to the proposed model. However, constructing the non-linear mapping in a large dataset using kernel methods can be computationally costly. To reduce the cost and allow large scale applications, the budget technique was introduced, showing good performance between speed and effectiveness.COLCIENCIASJóvenes investigadores 761/2016Línea de investigación: Ciencias de la computaciónMaestrí

    Learning compact hashing codes with complex objectives from multiple sources for large scale similarity search

    Get PDF
    Similarity search is a key problem in many real world applications including image and text retrieval, content reuse detection and collaborative filtering. The purpose of similarity search is to identify similar data examples given a query example. Due to the explosive growth of the Internet, a huge amount of data such as texts, images and videos has been generated, which indicates that efficient large scale similarity search becomes more important.^ Hashing methods have become popular for large scale similarity search due to their computational and memory efficiency. These hashing methods design compact binary codes to represent data examples so that similar examples are mapped into similar codes. This dissertation addresses five major problems for utilizing supervised information from multiple sources in hashing with respect to different objectives. Firstly, we address the problem of incorporating semantic tags by modeling the latent correlations between tags and data examples. More precisely, the hashing codes are learned in a unified semi-supervised framework by simultaneously preserving the similarities between data examples and ensuring the tag consistency via a latent factor model. Secondly, we solve the missing data problem by latent subspace learning from multiple sources. The hashing codes are learned by enforcing the data consistency among different sources. Thirdly, we address the problem of hashing on structured data by graph learning. A weighted graph is constructed based on the structured knowledge from the data. The hashing codes are then learned by preserving the graph similarities. Fourthly, we address the problem of learning high ranking quality hashing codes by utilizing the relevance judgments from users. The hashing code/function is learned via optimizing a commonly used non-smooth non-convex ranking measure, NDCG. Finally, we deal with the problem of insufficient supervision by active learning. We propose to actively select the most informative data examples and tags in a joint manner based on the selection criteria that both the data examples and tags should be most uncertain and dissimilar with each other.^ Extensive experiments on several large scale datasets demonstrate the superior performance of the proposed approaches over several state-of-the-art hashing methods from different perspectives

    Online hashing for fast similarity search

    Full text link
    In this thesis, the problem of online adaptive hashing for fast similarity search is studied. Similarity search is a central problem in many computer vision applications. The ever-growing size of available data collections and the increasing usage of high-dimensional representations in describing data have increased the computational cost of performing similarity search, requiring search strategies that can explore such collections in an efficient and effective manner. One promising family of approaches is based on hashing, in which the goal is to map the data into the Hamming space where fast search mechanisms exist, while preserving the original neighborhood structure of the data. We first present a novel online hashing algorithm in which the hash mapping is updated in an iterative manner with streaming data. Being online, our method is amenable to variations of the data. Moreover, our formulation is orders of magnitude faster to train than state-of-the-art hashing solutions. Secondly, we propose an online supervised hashing framework in which the goal is to map data associated with similar labels to nearby binary representations. For this purpose, we utilize Error Correcting Output Codes (ECOCs) and consider an online boosting formulation in learning the hash mapping. Our formulation does not require any prior assumptions on the label space and is well-suited for expanding datasets that have new label inclusions. We also introduce a flexible framework that allows us to reduce hash table entry updates. This is critical, especially when frequent updates may occur as the hash table grows larger and larger. Thirdly, we propose a novel mutual information measure to efficiently infer the quality of a hash mapping and retrieval performance. This measure has lower complexity than standard retrieval metrics. With this measure, we first address a key challenge in online hashing that has often been ignored: the binary representations of the data must be recomputed to keep pace with updates to the hash mapping. Based on our novel mutual information measure, we propose an efficient quality measure for hash functions, and use it to determine when to update the hash table. Next, we show that this mutual information criterion can be used as an objective in learning hash functions, using gradient-based optimization. Experiments on image retrieval benchmarks confirm the effectiveness of our formulation, both in reducing hash table recomputations and in learning high-quality hash functions

    Exploiting multimedia in creating and analysing multimedia Web archives

    No full text
    The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general
    corecore