70 research outputs found

    Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View

    Full text link
    Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content based multimedia information retrieval. We focus on graph based methods which have proven to provide state-of-the-art performances. We particularly examine two of such methods : cross-media similarities and random walk based scores. From a theoretical viewpoint, we propose a unifying graph based framework which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph based technique for the combination of visual and textual information. We compare cross-media and random walk based results using three different real-world datasets. From a practical standpoint, our extended empirical analysis allow us to provide insights and guidelines about the use of graph based methods for multimodal information fusion in content based multimedia information retrieval.Comment: An extended version of the paper: Visual and Textual Information Fusion in Multimedia Retrieval using Semantic Filtering and Graph based Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM Transactions on Information System

    Multimedia Retrieval: Survey Of Methods And Approaches

    Get PDF
    As we know there are numbers of applications present where multimedia retrieval is used and also numbers of sources are present. So accuracy is the major issue in retrieval process. There are number of techniques and datasets available to retrieve information. Some techniques uses only text-based image retrieval (TBIR), some uses content-based image retrieval (CBIR) while some are using combination of both. In this paper we are focusing on both TBIR and CBIR results and then fusing these two results. For fusing we are using late fusion. TBIR captures conceptual meaning while CBIR used to avoid false results. So final results are more accurate. In this paper our main goal is to take review of different methods and approaches used for Multimedia Retrieval

    Late Semantic Fusion Approach for the Retrieval of Multimedia Data

    Get PDF
    In Multimedia information retrieval late semantic fusion is used to combine textual pre-filtering with an image re-ranking. Three steps are used for retrieval processes. Visual and textual techniques are combined to help the developed Multimedia Information Retrieval System to minimize the semantic gap for given query. In the paper, different late semantic fusion approaches i.e. Product, Enrich, MaxMerge and FilterN are used and for experiments publicly available ImageCLEF Wikipedia Collection is used. DOI: 10.17762/ijritcc2321-8169.150610

    MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities

    Full text link
    In this paper, we introduce the MLM (Multiple Languages and Modalities) dataset - a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic data provide a resource that further tests the ability for multitask systems to learn relationships between entities. The dataset is designed for researchers and developers who build applications that perform multiple tasks on data encountered on the web and in digital archives. A second version of MLM provides a geo-representative subset of the data with weighted samples for countries of the European Union. We demonstrate the value of the resource in developing novel applications in the digital humanities with a motivating use case and specify a benchmark set of tasks to retrieve modalities and locate entities in the dataset. Evaluation of baseline multitask and single task systems on the full and geo-representative versions of MLM demonstrate the challenges of generalising on diverse data. In addition to the digital humanities, we expect the resource to contribute to research in multimodal representation learning, location estimation, and scene understanding

    Experiences from the ImageCLEF Medical Retrieval and Annotation Tasks

    Get PDF
    The medical tasks in ImageCLEF have been run every year from 2004-2018 and many different tasks and data sets have been used over these years. The created resources are being used by many researchers well beyond the actual evaluation campaigns and are allowing to compare the performance of many techniques on the same grounds and in a reproducible way. Many of the larger data sets are from the medical literature, as such images are easier to obtain and to share than clinical data, which was used in a few smaller ImageCLEF challenges that are specifically marked with the disease type and anatomic region. This chapter describes the main results of the various tasks over the years, including data, participants, types of tasks evaluated and also the lessons learned in organizing such tasks for the scientific community

    Reachability Analysis of Graph Modelled Collections

    Get PDF
    This paper is concerned with potential recall in multimodal information retrieval in graph-based models. We provide a framework to leverage individuality and combination of features of different modalities through our formulation of faceted search. We employ a potential recall analysis on a test collection to gain insight on the corpus and further highlight the role of multiple facets, relations between the objects, and semantic links in recall improvement. We conduct the experiments on a multimodal dataset containing approximately 400,000 documents and images. We demonstrate that leveraging multiple facets increases most notably the recall for very hard topics by up to 316%

    A Combined Approach of Structured and Non-structured IR in Multimodal Domain

    Get PDF
    We present a generic model for multimodal information retrieval, leveraging different information sources to improve the effectiveness of a retrieval system. The proposed method is able to take into account both explicit and latent semantics present in the data and can be used to answer complex queries, not currently answerable neither by document retrieval systems, nor by semantic web systems. By providing a hybrid approach combining IR and structured search techniques, we prepare a framework applicable to multimodal data collections. To test its effectiveness, we instantiate the model for an image retrieval task
    • …
    corecore