5 research outputs found

    Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos

    Get PDF
    In this paper, a content-aware approach is proposed to design multiple test conditions for shot cut detection, which are organized into a multiple phase decision tree for abrupt cut detection and a finite state machine for dissolve detection. In comparison with existing approaches, our algorithm is characterized with two categories of content difference indicators and testing. While the first category indicates the content changes that are directly used for shot cut detection, the second category indicates the contexts under which the content change occurs. As a result, indications of frame differences are tested with context awareness to make the detection of shot cuts adaptive to both content and context changes. Evaluations announced by TRECVID 2007 indicate that our proposed algorithm achieved comparable performance to those using machine learning approaches, yet using a simpler feature set and straightforward design strategies. This has validated the effectiveness of modelling of content-aware indicators for decision making, which also provides a good alternative to conventional approaches in this topic

    Content-based video classification and compariSon

    Full text link
    Automatic video analysis tools have dramatically increased in importance as the Internet video revolution has blossomed. This thesis presents an approach for automatic comparison of videos based on the inherent content. Also, an approach for creating groups (or clusters) of similar videos from a large video database is given; First, methods simplifying and summarizing the content of videos will be presented. Such methods include shot boundary detection and key frame feature extraction; Next, a comparison of different distance measures between videos will be given. These distance measures will be used to construct video clusters, and results will be compared

    Classificador para lĂ­ngua natural

    Get PDF
    Esta dissertação apresenta um classificador para textos não anotados escritos na língua inglesa, que não necessita de treino, Para estabelecer a relação entre palavras recorre-se à base de dados da WordNet. Cada palavra do texto é comparada com cada conceito que define os temas de catalogação. Esta comparação é efetuada tendo em consideração a estrutura hierárquica das relações definidas na WordNet. Desta forma é conservada a afinidade entre termos mais gerais ou específicos, bem como entre termos da mesma área. O programa foi desenvolvido com o fim de integrar um sistema concorrente no TRECVID - um concurso anual que visa encorajar 0 avanço do desenvolvimento de aplicações na área de busca e indexação de vídeo digital. Apesar do âmbito inicial ser específico, a aplicação revela grande potencial para ser usado em qualquer texto em inglês. /ABSTRACT - This work presents a training-free, English language, unannotated text classifier. WordNet's database is used as a foundation to relate words. Each word is compared to a concept that defines the classification topics. This operation takes the hierarchical nature of WordNet's relations into account. In this way, the affinity between more general and more specific terms is maintained, as well as terms in the same domain. The program was developed to integrate a competing system at TREC-VID - an annual competition that aims to encourage research in video indexation and retrieval, Despite the restricted initial goal, the application shows great potential to be used with any English text

    Semantics of video shots for content-based retrieval

    Get PDF
    Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future
    corecore