40 research outputs found

    Query-Adaptive Fusion for Multimodal Search

    Full text link

    High-level feature detection from video in TRECVid: a 5-year retrospective of achievements

    Get PDF
    Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video

    Preliminaries. Medieval Worlds|Volume 2016.3 medieval worlds Volume 2016.3|

    Get PDF
    Text-based search using video speech transcripts is a popular approach for granular video retrieval at the shot or story level. However, misalignment of speech and visual tracks, speech transcription errors, and other characteristics of video content pose unique challenges for this video retrieval approach. In this paper, we explore several automatic query refinement methods to address these issues. We consider two query expansion methods based on pseudo-relevance feedback and one query refinement method based on semantic text annotation. We evaluate these approaches in the context of the TRECVID 2005 video retrieval benchmark using a baseline approach without any refinement. To improve robustness, we also consider a query-independent fusion approach. We show that this combined approach can outperform the baseline for most query topics, with improvements of up to 40%. We also show that query-dependent fusion approaches can potentially improve the results further, leading to 18-75% gains when tuned with optimal fusion parameter

    Processing top-k join queries

    No full text

    Combining fuzzy information

    No full text

    WALRUS

    No full text
    corecore