5 research outputs found

    Unsupervised Graph-based Rank Aggregation for Improved Retrieval

    Full text link
    This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions

    Multimodal Prediction based on Graph Representations

    Full text link
    This paper proposes a learning model, based on rank-fusion graphs, for general applicability in multimodal prediction tasks, such as multimodal regression and image classification. Rank-fusion graphs encode information from multiple descriptors and retrieval models, thus being able to capture underlying relationships between modalities, samples, and the collection itself. The solution is based on the encoding of multiple ranks for a query (or test sample), defined according to different criteria, into a graph. Later, we project the generated graph into an induced vector space, creating fusion vectors, targeting broader generality and efficiency. A fusion vector estimator is then built to infer whether a multimodal input object refers to a class or not. Our method is capable of promoting a fusion model better than early-fusion and late-fusion alternatives. Performed experiments in the context of multiple multimodal and visual datasets, as well as several descriptors and retrieval models, demonstrate that our learning model is highly effective for different prediction scenarios involving visual, textual, and multimodal features, yielding better effectiveness than state-of-the-art methods

    Agregação de ranks baseada em grafos

    Get PDF
    Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Neste trabalho, apresentamos uma abordagem robusta de agregação de listas baseada em grafos, capaz de combinar resultados de modelos de recuperação isolados. O método segue um esquema não supervisionado, que é independente de como as listas isoladas são geradas. Nossa abordagem é capaz de incorporar modelos heterogêneos, de diferentes critérios de recuperação, tal como baseados em conteúdo textual, de imagem ou híbridos. Reformulamos o problema de recuperação ad-hoc como uma recuperação baseada em fusion graphs, que propomos como um novo modelo de representação unificada capaz de mesclar várias listas e expressar automaticamente inter-relações de resultados de recuperação. Assim, mostramos que o sistema de recuperação se beneficia do aprendizado da estrutura intrínseca das coleções, levando a melhores resultados de busca. Nossa formulação de agregação baseada em grafos, diferentemente das abordagens existentes, permite encapsular informação contextual oriunda de múltiplas listas, que podem ser usadas diretamente para ranqueamento. Experimentos realizados demonstram que o método apresenta alto desempenho, produzindo melhores eficácias que métodos recentes da literatura e promovendo ganhos expressivos sobre os métodos de recuperação fundidos. Outra contribuição é a extensão da proposta de grafo de fusão visando consulta eficiente. Trabalhos anteriores são promissores quanto à eficácia, mas geralmente ignoram questões de eficiência. Propomos uma função inovadora de agregação de consulta, não supervisionada, intrinsecamente multimodal almejando recuperação eficiente e eficaz. Introduzimos os conceitos de projeção e indexação de modelos de representação de agregação de consulta com base em grafos, e a sua aplicação em tarefas de busca. Formulações de projeção são propostas para representações de consulta baseadas em grafos. Introduzimos os fusion vectors, uma representação de fusão tardia de objetos com base em listas, a partir da qual é definido um modelo de recuperação baseado intrinsecamente em agregação. A seguir, apresentamos uma abordagem para consulta rápida baseada nos vetores de fusão, promovendo agregação de consultas eficiente. O método apresentou alta eficácia quanto ao estado da arte, além de trazer uma perspectiva de eficiência pouco abordada. Ganhos consistentes de eficiência são alcançadas em relação aos trabalhos recentes. Também propomos modelos de representação baseados em consulta para problemas gerais de predição. Os conceitos de grafos de fusão e vetores de fusão são estendidos para cenários de predição, nos quais podem ser usados para construir um modelo de estimador para determinar se um objeto de avaliação (ainda que multimodal) se refere a uma classe ou não. Experimentos em tarefas de classificação multimodal, tal como detecção de inundação, mostraram que a solução é altamente eficaz para diferentes cenários de predição que envolvam dados textuais, visuais e multimodais, produzindo resultados melhores que vários métodos recentes. Por fim, investigamos a adoção de abordagens de aprendizagem para ajudar a otimizar a criação de modelos de representação baseados em consultas, a fim de maximizar seus aspectos de capacidade discriminativa e eficiência em tarefas de predição e de buscaAbstract: In this work, we introduce a robust graph-based rank aggregation approach, capable of combining results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to incorporate heterogeneous models, defined in terms of different ranking criteria, such as those based on textual, image, or hybrid content representations. We reformulate the ad-hoc retrieval problem as a graph-based retrieval based on {\em fusion graphs}, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we show that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused. Another contribution refers to the extension of the fusion graph solution for efficient rank aggregation. Although previous works are promising with respect to effectiveness, they usually overlook efficiency aspects. We propose an innovative rank aggregation function that it is unsupervised, intrinsically multimodal, and targeted for fast retrieval and top effectiveness performance. We introduce the concepts of embedding and indexing graph-based rank-aggregation representation models, and their application for search tasks. Embedding formulations are also proposed for graph-based rank representations. We introduce the concept of {\em fusion vectors}, a late-fusion representation of objects based on ranks, from which an intrinsically rank-aggregation retrieval model is defined. Next, we present an approach for fast retrieval based on fusion vectors, thus promoting an efficient rank aggregation system. Our method presents top effectiveness performance among state-of-the-art related work, while promoting an efficiency perspective not yet covered. Consistent speedups are achieved against the recent baselines in all datasets considered. Derived from the fusion graphs and fusion vectors, we propose rank-based representation models for general prediction problems. The concepts of fusion graphs and fusion vectors are extended to prediction scenarios, where they can be used to build an estimator model to determine whether an input (even multimodal) object refers to a class or not. Performed experiments in the context of multimodal classification tasks, such as flood detection, show that the proposed solution is highly effective for different detection scenarios involving textual, visual, and multimodal features, yielding better detection results than several state-of-the-art methods. Finally, we investigate the adoption of learning approaches to help optimize the creation of rank-based representation models, in order to maximize their discriminative power and efficiency aspects in prediction and search tasksDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Towards Efficient Visual Analysis Without Extra Supervision

    Full text link
    Visual analysis has received increasing attention in the fields of computer vision and multimedia. With enough labeled training data,existing deep learning based methods can achieve promising performance. However, visual analysis faces the severe data-scarcitychallenge: for some category of interest, only very few, perhaps even none, positive examples are available, and the performancedrops dramatically when the number of positive samples falls short. In some real-world applications, people are also interested inrecognizing concepts that do not appear in the training stage at all.Zero-shot learning and few-shot learning has been widely explored to tackle the problem of data scarcity. Although some promisingresults have been achieved, existing models still have some inherent limitations. 1) They lack the ability to simultaneously detectand recognize unseen objects by only exploring natural language description. 2) They fail to consider that different conceptshave different degree of relevance to a certain category. They cannot mine these difference statistically for a more accurateevent-concept association. 3) They remain very limited in terms of their ability to deal with semantically unrepresentative eventnames, and lack of coherence between visual and textual concepts. 4) They lack the ability to improve the model performanceby recycling the given limited annotation. To solve these challenges, in this thesis, we aim to develop a series of robust statisticallearning models to improve the performance visual analysis without extra supervision.In Chapter 2, we focus on how to simultaneously recognize and locate novel object instances using purely unstructured textualdescriptions with no training samples. The goal is to concurrently link visual image features with the semantic label informationwhere the descriptions of novel concepts are presented in the form of natural languages. In Chapter 3, we propose a new zero-shotevent detection approach, which exploits the semantic correlation between an event and concepts. Our method learns the semanticcorrelation from the concept vocabulary and emphasizes on the most related concepts. In Chapter 4, we propose a method ofgrounding visual concepts for large-scale Multimedia Event Detection and Multimedia Event Captioning in zero-shot setting. InChapter 5, we present a novel improved temporal action localization model that is better able to take advantage of limited labeleddata available

    Predição de relevância em sistemas de recuperação de informação

    Get PDF
    Orientador: Anderson de Rezende RochaTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: No mundo conectado atual, Recuperação de Informação (IR) tem se tornado um campo de pesquisa de crescente interesse, sendo um problema presente em muitas aplicações modernas. Dentre os muitos desafios no desenvolvimento the sistemas de IR está uma correta avaliação de performance desses sistemas. Avaliação \emph{offline}, entretanto, se limita na maioria dos casos ao \emph{benchamark} e comparação de performance entre diferentes sistemas. Esse fato levou ao surgimento do problema denomidado Predição de Performance de Consulta (QPP), cujo objetivo é estimar, em tempo de consulta, a qualidade dos resultados obtidos. Nos últimos anos, QPP recebeu grande atenção na literatura, sobretudo no contexto de busca textual. Ainda assim, QPP também tem suas limitações, principalmente por ser uma maneira indireta de estimar a performance de sistemas de IR. Nessa tese, investigamos formular o problema de QPP como um problema de \emph{predição de relevância}: a tarefa de predizer, para um determinado \topk, quais resultados de uma consulta são de fato relevantes para ela, de acordo com uma referência de relevância existente. Apesar de notavelmente desafiador, predição de relevância é não só uma maneira mais natural de estimar performance, como também com diversas aplicações. Nessa tese, apresentamos três famílias de métodos de predição de relevância: estatísticos, aprendizado, e rotulação sequencial. Todos os métodos nessas famílias tiveram sua efetividade avaliada em diversos experimentos em recuperação de imagens por conteúdo, cobrindo uma vasta gama de conjuntos de dados de grande-escala, assim como diferentes configurações de recuperação. Mostramos que é possível gerar predições de relevância precisas, para grandes valores de kk, não só connhecendo pouco do sistema de IR analisado, como também de forma eficiente o bastante para ser aplicável em tempo de consulta. Finalizamos esta tese discutindo alguns caminhos possíveis para melhorar os resultados obtidos, assim como trabalhos futuros nesse campo de pesquisaAbstract: In today¿s connected world, Information Retrieval (IR) has become one of the most ubiquitous problems, being part of many modern applications. Among all challenges in designing IR systems, how to evaluate their performance is ever-present. Offline evaluation, however, is mostly limited to benchmarking and comparison of different systems, which has pushed a growing interest in predicting, at query time, the performance of an IR system. Query Performance Prediction (QPP) is the name given to the problem of estimating the quality of results retrieved by an IR system in response to a query. In the past few years, this problem received much attention, especially by the text retrieval community. Yet, QPP is still limited as only an indirect way of estimating the performance of IR systems. In this thesis, we investigate formulating the QPP problem as a \emph{relevance prediction} one: the task of predicting, for a specific \topk, which results of a query are relevant to it, according to some existing relevance reference. Though remarkably challenging, relevance prediction is not only a more natural way of predicting performance but also one with significantly more applications. In this thesis, we present three families of relevance prediction approaches: statistical, learning, and sequence labeling. All methods within those families are evaluated concerning their effectiveness in several content-based image retrieval experiments, covering several large-scale datasets and retrieval settings. The experiments in this thesis show that it is feasible to perform relevance prediction for kk values as large as 30, with minimal information about the underlying IR system, and efficiently enough to be performed at query time. This thesis is concluded by offering some potential paths for improving the current results, as well as future research in this particular fieldDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação168326/2017-5CAPESCNP
    corecore