193 research outputs found

    Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View

    Full text link
    Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content based multimedia information retrieval. We focus on graph based methods which have proven to provide state-of-the-art performances. We particularly examine two of such methods : cross-media similarities and random walk based scores. From a theoretical viewpoint, we propose a unifying graph based framework which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph based technique for the combination of visual and textual information. We compare cross-media and random walk based results using three different real-world datasets. From a practical standpoint, our extended empirical analysis allow us to provide insights and guidelines about the use of graph based methods for multimodal information fusion in content based multimedia information retrieval.Comment: An extended version of the paper: Visual and Textual Information Fusion in Multimedia Retrieval using Semantic Filtering and Graph based Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM Transactions on Information System

    Sparse Transfer Learning for Interactive Video Search Reranking

    Get PDF
    Visual reranking is effective to improve the performance of the text-based video search. However, existing reranking algorithms can only achieve limited improvement because of the well-known semantic gap between low level visual features and high level semantic concepts. In this paper, we adopt interactive video search reranking to bridge the semantic gap by introducing user's labeling effort. We propose a novel dimension reduction tool, termed sparse transfer learning (STL), to effectively and efficiently encode user's labeling information. STL is particularly designed for interactive video search reranking. Technically, it a) considers the pair-wise discriminative information to maximally separate labeled query relevant samples from labeled query irrelevant ones, b) achieves a sparse representation for the subspace to encodes user's intention by applying the elastic net penalty, and c) propagates user's labeling information from labeled samples to unlabeled samples by using the data distribution knowledge. We conducted extensive experiments on the TRECVID 2005, 2006 and 2007 benchmark datasets and compared STL with popular dimension reduction algorithms. We report superior performance by using the proposed STL based interactive video search reranking.Comment: 17 page

    Review on QA Performance Improvement using Multimedia Techniques

    Get PDF
    CQA (Community question Answering) which is the record of millions of question and answer which is created. CQA user to provide a rich resources of information which is missing at Web Search Engine and to automate and enhanced the process of locating the high-quality answer question at a CQA.CQA archived the question that are matched by the CQA system with respect to QA which significantly minimizes the user time and efforts involved in searching for answer to Question. CQA forum usually provide only the textual answer which are not enough for many question. In this paper, we propose a scheme that will be able to enrich textual answer in CQA with appropriate using of media data. In this project our scheme consists of 3 components such as the Answer Medium Selection, The Query Generation for Multimedia, The Multimedia data Selection and Presentation. In this our approach is to automatically determine which type of the media information can be added to enrich the textual answer. In this by processing for large datasets QApair and adding them to a pool, in this our approach as a user’s can find multimedia question answer (MMQA) by matching their questions with those in the pool. Different from a lot of multimedia QA research efforts that attempt to directly answer the question with Image and Video data. In this our approach is built based on Community Contributed Textual Answer and can deal with more complex questions. In this we have also conducted extensive experiment on a multi-source QA datasets there the result Demonstrate the effectiveness of our approach DOI: 10.17762/ijritcc2321-8169.15081

    A Review on Video Search Engine Ranking

    Get PDF
    Search reranking is considered as a best and basic approach to enhance recovery accuracy. The recordings are recovered utilizing the related literary data, for example, encompassing content from the website page. The execution of such frameworks basically depends on the importance between the content and the recordings. In any case, they may not generally coordinate all around ok, which causes boisterous positioning results. For example, outwardly comparative recordings may have altogether different positions. So reranking has been proposed to tackle the issue. Video reranking, as a compelling approach to enhance the consequences of electronic video look however the issue is not paltry particularly when we are thinking about different elements or modalities for pursuit in video and video recovery. This paper proposes another sort of reranking calculation, the round reranking, that backings the common trade of data over numerous modalities for enhancing seek execution and takes after the rationality of solid performing methodology could gain from weaker ones

    Reordenació i agrupament d’imatges resultants d’una cerca de vídeo

    Get PDF
    La recuperació de vídeo a través de consultes textuals es una practica molt comú en els arxius de radiodifusió. Les paraules clau de les consultes son comparades amb les metadades que s’anoten manualment als assets de vídeo pels documentalistes. A més, les cerques textuals bàsiques generen llistes de resultats planes, on tots els resultats tenen la mateixa importància, ja que, es limita a avaluar binàriament si la paraula de cerca apareix o no entre les metadades associades als continguts. A més, acostumen a mostrar continguts molt similars, donant al usuari una llista ordenada de resultats de poca diversitat visual. La redundància en els resultats provoca un malbaratament d’espai a la interfície gràfica d’usuari (GUI) que sovint obliga a l’usuari a interactuar fortament amb la interfície gràfica fins localitzar els resultats rellevants per a la seva cerca. La aportació del present projecte consisteix en la presentació d’una estratègia de reordenació i agrupació per obtenir keyframes de major rellevància entre els primers resultats, però al mateix temps mantenir una diversitat d’assets. D’aquesta forma, aquestes tècniques permetran millorar els sistemes de visualització d’imatges resultants d’una cerca de vídeo. L’eina global es dissenya per ser integrada en l'entorn del Digition, el gestor de continguts audiovisuals de la Corporació Catalana de Mitjans Audiovisuals

    Rescue Tail Queries: Learning to Image Search Re-rank via Click-wise Multimodal Fusion

    Get PDF
    ABSTRACT Image search engines have achieved good performance for head (popular) queries by leveraging text information and user click data. However, there still remain a large number of tail (rare) queries with relatively unsatisfying search results, which are often overlooked in existing research. Image search for these tail queries therefore provides a grand challenge for research communities. Most existing re-ranking approaches, though effective for head queries, cannot be extended to tail. The assumption of these approaches that the re-ranked list should not go far away from the initial ranked list is not applicable to the tail queries. The challenge, thus, relies on how to leverage the possibly unsatisfying initial ranked results and the very limited click data to solve the search intent gap of tail queries. To deal with this challenge, we propose to mine relevant information from the very few click data by leveraging click-wise-based image pairs and query-dependent multimodal fusion. Specifically, we hypothesize that images with more clicks are more relevant to the given query than the ones with no or relatively less clicks and the effects of different visual modalities to re-rank images are query-dependent. We therefore propose a novel query-dependent learning to re-rank approach for tail queries, called "click-wise multimodal fusion." The approach can not only effectively expand training data by learning relevant information from the constructed click-wise-based image pairs, but also fully explore the effects of multiple visual modalities by adaptively predicting the query-dependent fusion weights. The experiments conducted on a real-world dataset with 100 tail queries show that our proposed approach can significantly improve initial search results by 10.88% and 9.12% in terms of NDCG@5 and NDCG@10, respectively, and outperform several existing re-ranking approaches. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval models General Terms Algorithms, Experimentation, Performance Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. . Existing commercial search engines achieve very limited image search performance for tail queries

    Multimedia question answering

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Semantics-based selection of everyday concepts in visual lifelogging

    Get PDF
    Concept-based indexing, based on identifying various semantic concepts appearing in multimedia, is an attractive option for multimedia retrieval and much research tries to bridge the semantic gap between the media’s low-level features and high-level semantics. Research into concept-based multimedia retrieval has generally focused on detecting concepts from high quality media such as broadcast TV or movies, but it is not well addressed in other domains like lifelogging where the original data is captured with poorer quality. We argue that in noisy domains such as lifelogging, the management of data needs to include semantic reasoning in order to deduce a set of concepts to represent lifelog content for applications like searching, browsing or summarisation. Using semantic concepts to manage lifelog data relies on the fusion of automatically-detected concepts to provide a better understanding of the lifelog data. In this paper, we investigate the selection of semantic concepts for lifelogging which includes reasoning on semantic networks using a density-based approach. In a series of experiments we compare different semantic reasoning approaches and the experimental evaluations we report on lifelog data show the efficacy of our approach

    Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context

    Full text link
    corecore