10 research outputs found

    Multimodal content-based video retrieval

    Get PDF

    Segment Based Indexing Technique for Video Data File

    Get PDF
    AbstractA video is an effective tool to exchange the information in the structure of showing the brief text message due to the advance developed technology. Video capturing is effortless process but the related video retrieval is the difficult process, for that process the videos must be indexed. Retrieval is the method that retrieved a video using a user query. The query will be image or texts depend upon the query result output system that returned a particular video or image based on that query. In this project we create a indexing for video file by using segment based indexing technique. Here video will be divided into a hierarchy which is in storyboards of film making. For instance, a hierarchical based video search is composed into multi stage abstraction for assist the users to locate the specific video segments/frames logically. This paper brings out the reduced bandwidth and reduced delays the video through the network of searching and reviewing. Experimental results verify this

    Multimedia Retrieval

    Get PDF

    Automatic Quality Assessment of Lecture Videos Using Multimodal Features

    Get PDF
    Multimedia Retrieval, eine entwickelte Methodologie, welche aus Information Retrieval stammt, wird in der digitalisierten Gesellschaft weit verbreitet eingesetzt. Bei der Suche nach Videos im Internet, müssen diese nach ihrer Relevanz sortiert werden. Die meisten Ansätze berechnen die Relevanz jedoch nur aus grundlegenden Inhaltsinformationen. Ziel dieser Arbeit ist es, Relevanz in verschiedenen Modalitäten zu analysieren. Für den konkreten Fall von Vortragsvideos, Merkmale von folgenden Modalitäten werden von dementsprechenden Kursmaterialien extrahiert: akustische, linguistische, und visuelle Modalität. Außerdem sind modalitätsübergreifende Merkmale insbesondere in dieser Arbeit zunächst vorgeschlagen und berechnet durch die Verarbeitung von Audio, Bilder, Transkripte und Texte. Eine Benutzerevaluation wurde durchgeführt, um Benutzermeinungen in Bezug auf die erzeugten Merkmale zu erheben. Die Ergebnisse haben gezeigt, dass die meisten Merkmale ein Video in verschiedenen Aspekten widerspiegeln können. Die Art und Weise, wie der Lerneffekt durch diese Merkmale beeinflusst wird, wird ebenfalls berücksichtigt. Für die weitere Forschung baut diese Studie eine solide Basis für die Extraktion der Merkmale auf. Zudem gewinnt die Arbeit ein besseres Verständnis zum Lernen.Mutimedia retrieval, a developed methodology based on information retrieval, is broadly used in the digitalised society. When searching videos online, they need to be sorted according to their relevance. However, most approaches calculate the relevance only from basic content information. This thesis aims to analyse the relevance in multiple modalities. For the specific case of lecture videos, features from following modalities are extracted from corresponding course materials: audio, linguistic, and visual modality. Furthermore, cross-modal features are specifically first proposed in this thesis and calculated by processing audio, images, transcripts, and texts. A user evaluation has been conducted to collect user's opinions with regards to these generated features. The results have shown that most features can reflect a video in multiple aspects. The way the learning effect is influenced by these features is considered as well. For further research, this study builds a solid base for feature extraction and gains a better understanding of learning

    Semantic multimedia analysis using knowledge and context

    Get PDF
    PhDThe difficulty of semantic multimedia analysis can be attributed to the extended diversity in form and appearance exhibited by the majority of semantic concepts and the difficulty to express them using a finite number of patterns. In meeting this challenge there has been a scientific debate on whether the problem should be addressed from the perspective of using overwhelming amounts of training data to capture all possible instantiations of a concept, or from the perspective of using explicit knowledge about the concepts’ relations to infer their presence. In this thesis we address three problems of pattern recognition and propose solutions that combine the knowledge extracted implicitly from training data with the knowledge provided explicitly in structured form. First, we propose a BNs modeling approach that defines a conceptual space where both domain related evi- dence and evidence derived from content analysis can be jointly considered to support or disprove a hypothesis. The use of this space leads to sig- nificant gains in performance compared to analysis methods that can not handle combined knowledge. Then, we present an unsupervised method that exploits the collective nature of social media to automatically obtain large amounts of annotated image regions. By proving that the quality of the obtained samples can be almost as good as manually annotated images when working with large datasets, we significantly contribute towards scal- able object detection. Finally, we introduce a method that treats images, visual features and tags as the three observable variables of an aspect model and extracts a set of latent topics that incorporates the semantics of both visual and tag information space. By showing that the cross-modal depen- dencies of tagged images can be exploited to increase the semantic capacity of the resulting space, we advocate the use of all existing information facets in the semantic analysis of social media

    Multi-modal surrogates for retrieving and making sense of videos: is synchronization between the multiple modalities optimal?

    Get PDF
    Video surrogates can help people quickly make sense of the content of a video before downloading or seeking more detailed information. Visual and audio features of a video are primary information carriers and might become important components of video retrieval and video sense-making. In the past decades, most research and development efforts on video surrogates have focused on visual features of the video, and comparatively little work has been done on audio surrogates and examining their pros and cons in aiding users' retrieval and sense-making of digital videos. Even less work has been done on multi-modal surrogates, where more than one modality are employed for consuming the surrogates, for example, the audio and visual modalities. This research examined the effectiveness of a number of multi-modal surrogates, and investigated whether synchronization between the audio and visual channels is optimal. A user study was conducted to evaluate six different surrogates on a set of six recognition and inference tasks to answer two main research questions: (1) How do automatically-generated multi-modal surrogates compare to manually-generated ones in video retrieval and video sense-making? and (2) Does synchronization between multiple surrogate channels enhance or inhibit video retrieval and video sense-making? Forty-eight participants participated in the study, in which the surrogates were measured on the the time participants spent on experiencing the surrogates, the time participants spent on doing the tasks, participants' performance accuracy on the tasks, participants' confidence in their task responses, and participants' subjective ratings on the surrogates. On average, the uncoordinated surrogates were more helpful than the coordinated ones, but the manually-generated surrogates were only more helpful than the automatically-generated ones in terms of task completion time. Participants' subjective ratings were more favorable for the coordinated surrogate C2 (Magic A + V) and the uncoordinated surrogate U1 (Magic A + Storyboard V) with respect to usefulness, usability, enjoyment, and engagement. The post-session questionnaire comments demonstrated participants' preference for the coordinated surrogates, but the comments also revealed the value of having uncoordinated sensory channels

    Multimodal content-based video retrieval

    No full text
    This chapter is a case study showing how important events (highlights) can be automatically detected in video recordings of Formula 1 car racing. Numerous approaches presented in literature have shown that it is becoming possible to extract interesting events from video. However, the majority of the approaches uses individual visual or audio cues. According to the current understanding of human perception it is expected that using evidence obtained from different modalities should result in a more robust and accurate perception of video. On the other hand, fusion of multimodal evidence is quite challenging, since it has to deal with indications which may contradict each other. In this chapter we deal with three topics, one being fusion of evidence from different modalities

    Multimodal content-based video retrieval

    No full text
    This chapter is a case study showing how important events (highlights) can be automatically detected in video recordings of Formula 1 car racing. Numerous approaches presented in literature have shown that it is becoming possible to extract interesting events from video. However, the majority of the approaches uses individual visual or audio cues. According to the current understanding of human perception it is expected that using evidence obtained from different modalities should result in a more robust and accurate perception of video. On the other hand, fusion of multimodal evidence is quite challenging, since it has to deal with indications which may contradict each other. In this chapter we deal with three topics, one being fusion of evidence from different modalities
    corecore