5 research outputs found
Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos
In this paper, a content-aware approach is proposed to design multiple test conditions for shot cut detection, which are organized into a multiple phase decision tree for abrupt cut detection and a finite state machine for dissolve detection. In comparison with existing approaches, our algorithm is characterized with two categories of content difference indicators and testing. While the first category indicates the content changes that are directly used for shot cut detection, the second category indicates the contexts under which the content change occurs. As a result, indications of frame differences are tested with context awareness to make the detection of shot cuts adaptive to both content and context changes. Evaluations announced by TRECVID 2007 indicate that our proposed algorithm achieved comparable performance to those using machine learning approaches, yet using a simpler feature set and straightforward design strategies. This has validated the effectiveness of modelling of content-aware indicators for decision making, which also provides a good alternative to conventional approaches in this topic
Content-based video classification and compariSon
Automatic video analysis tools have dramatically increased in importance as the Internet video revolution has blossomed. This thesis presents an approach for automatic comparison of videos based on the inherent content. Also, an approach for creating groups (or clusters) of similar videos from a large video database is given; First, methods simplifying and summarizing the content of videos will be presented. Such methods include shot boundary detection and key frame feature extraction; Next, a comparison of different distance measures between videos will be given. These distance measures will be used to construct video clusters, and results will be compared
Classificador para lĂngua natural
Esta dissertação apresenta um classificador para textos nĂŁo anotados escritos na lĂngua inglesa, que nĂŁo necessita de treino, Para estabelecer a relação entre palavras recorre-se Ă base de dados da WordNet. Cada palavra do texto Ă© comparada com cada conceito que define os temas de catalogação. Esta comparação Ă© efetuada tendo em consideração a estrutura hierárquica das relações definidas na WordNet. Desta forma Ă© conservada a afinidade entre termos mais gerais ou especĂficos, bem como entre termos da mesma área. O programa foi desenvolvido com o fim de integrar um sistema concorrente no TRECVID - um concurso anual que visa encorajar 0 avanço do desenvolvimento de aplicações na área de busca e indexação de vĂdeo digital. Apesar do âmbito inicial ser especĂfico, a aplicação revela grande potencial para ser usado em qualquer texto em inglĂŞs. /ABSTRACT - This work presents a training-free, English language, unannotated text classifier. WordNet's database is used as a foundation to relate words. Each word is compared to a concept that defines the classification topics. This operation takes the hierarchical nature of WordNet's relations into account. In this way, the affinity between more general and more specific terms is maintained, as well as terms in the same domain. The program was developed to integrate a competing system at TREC-VID - an annual competition that aims to encourage research in video indexation and retrieval, Despite the restricted initial goal, the application shows great potential to be used with any English text
Recommended from our members
Content-based Digital Video Processing. Digital Videos Segmentation, Retrieval and Interpretation.
Recent research approaches in semantics based video content analysis require shot boundary detection as the first step to divide video sequences into sections. Furthermore, with the advances in networking and computing capability, efficient retrieval of multimedia data has become an important issue. Content-based retrieval technologies have been widely implemented to protect intellectual property rights (IPR). In addition, automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications.
In this thesis, a paradigm is proposed to segment, retrieve and interpret digital videos. Five algorithms are presented to solve the video segmentation task. Firstly, a simple shot cut detection algorithm is designed for real-time implementation. Secondly, a systematic method is proposed for shot detection using content-based rules and FSM (finite state machine). Thirdly, the shot detection is implemented using local and global indicators. Fourthly, a context awareness approach is proposed to detect shot boundaries. Fifthly, a fuzzy logic method is implemented for shot detection. Furthermore, a novel analysis approach is presented for the detection of video copies. It is robust to complicated distortions and capable of locating the copy of segments inside original videos. Then,
iv
objects and events are extracted from MPEG Sequences for Video Highlights Indexing and Retrieval. Finally, a human fighting detection algorithm is proposed for movie annotation
Semantics of video shots for content-based retrieval
Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future