44 research outputs found

    Caption-guided patent image segmentation

    Full text link

    Fusion of Compound Queries with Multiple Modalities for Known Item Video Search

    No full text
    Multimedia collections are ubiquitous and very often contain hundreds of hours of video information. The retrieval of a particular scene of a video (Known Item Search) in a large collection is a difficult problem, considering the multimodal character of all video shots and the complexity of the query, either visual or textual. We tackle these challenges by fusing, first, multiple modalities in a nonlinear graph-based way for each subtopic of the query. In addition, we fuse the top retrieved video shots per sub-query to provide the final list of retrieved shots, which is then re-ranked using temporal information. The framework is evaluated in popular Known Item Search tasks in the context of video shot retrieval and provides the largest Mean Reciprocal Rank scores

    A Hybrid graph-based and non-linear late fusion approach for multimedia retrieval

    No full text
    Comunicaci贸 presentada a: 14th International Workshop on Content-Based Multimedia Indexing (CBMI 2016) celebrat del 15 al 17 de juny de 2016 a Bucarest, Romania.Nowadays, multimedia retrieval has become a task of high importance, due to the need for efficient and fast access to very large and heterogeneous multimedia collections. An interesting challenge within the aforementioned task is the efficient combination of different modalities in a multimedia object and especially the fusion between textual and visual information. The fusion of multiple modalities for retrieval in an unsupervised way has been mostly based on early, weighted linear, graph-based and diffusion-based techniques. In contrast, we present a strategy for fusing textual and visual modalities, through the combination of a non-linear fusion model and a graph-based late fusion approach. The fusion strategy is based on the construction of a uniform multimodal contextual similarity matrix and the non-linear combination of relevance scores from query-based similarity vectors. The proposed late fusion approach is evaluated in the multimedia retrieval task, by applying it to two multimedia collections, namely the WIKI11 and IAPR-TC12. The experimental results indicate its superiority over the baseline method in terms of Mean Average Precision for both considered datasets.This work was supported by the projects MULTISENSOR (FP7-610411) and KRISTINA (H2020-645012), funded by the European Commission

    Retrieval of multimedia objects by fusing multiple modalities

    No full text
    Comunicaci贸 presentada a: ICMR'16. International Conference on Multimedia Retrieval 2016, celebrat a Nova York del 6 al 9 de juny de 2016Searching for multimedia objects with heterogeneous modal- ities is critical for the construction of e ective multimedia retrieval systems. Towards this direction, we propose a framework for the multimodal fusion of visual and textual similarities, based on visual features, visual concepts and textual concepts. Our method is compared to the baseline method that only fuses two modalities but integrates all early, late, linearly weighted, di usion and graph-based models in one unifying framework. Our framework integrates more than two modalities and high-level information, so as to retrieve multimedia objects enriched with high-level textual and vi- sual concepts, in response to a multimodal query. The experimental comparison is done under the same memory complexity, in two multimedia collections in the multimedia retrieval task. The results have shown that we outperform the baseline method, in terms of Mean Average Precision.This work was partially supported by the European Commission by the projects MULTISENSOR (FP7-610411) and KRISTINA (H2020-645012)

    Retrieval of multimedia objects by fusing multiple modalities

    No full text
    Comunicaci贸 presentada a: ICMR'16. International Conference on Multimedia Retrieval 2016, celebrat a Nova York del 6 al 9 de juny de 2016Searching for multimedia objects with heterogeneous modalities is critical for the construction of effective multimedia retrieval systems. Towards this direction, we propose a framework for the multimodal fusion of visual and textual similarities, based on visual features, visual concepts and textual concepts. Our method is compared to the baseline method that only fuses two modalities but integrates all early, late, linearly weighted, diffusion and graph-based models in one unifying framework. Our framework integrates more than two modalities and high-level information, so as to retrieve multimedia objects enriched with high-level textual and visual concepts, in response to a multimodal query. The experimental comparison is done under the same memory complexity, in two multimedia collections in the multimedia retrieval task. The results have shown that we outperform the baseline method, in terms of Mean Average Precision.This work was partially supported by the European Commission by the projects MULTISENSOR (FP7-610411) and KRISTINA (H2020-645012)

    A Multimedia interactive search engine based on graph-based and non-linear multimodal fusion

    No full text
    Comunicaci贸 presentada a: 14th International Workshop on Content-Based Multimedia Indexing (CBMI 2016) celebrat del 15 al 17 de juny de 2016 a Bucarest, Romania.This paper presents an interactive multimedia search engine, which is capable of searching into multimedia collections by fusing textual and visual information. Apart from multimedia search, the engine is able to perform text search and image retrieval independently using both high-level and lowlevel information. The images of the multimedia collection are organized by color, offering fast browsing in the image collection.This work was partially supported by the European Commission by the projects MULTISENSOR (FP7-610411), HOMER (FP7-312883) and KRISTINA (H2020-645012)
    corecore