12 research outputs found

    AVSST: an Automatic Video Stream Structuring Tool

    Get PDF
    International audienceThe aim of this paper is to present the tool that we have developed to automatically structure TV streams. The objective is to determine precisely the start and the end of broadcasted TV programs (P). Usually, TV channels separate programs with breaks (B). These breaks can be commercials, trailers, station identification breaks (monochrome frames for example), or bumpers. They may be broadcasted several times in the stream. The detection of these repetitions is the key of our method to structure the TV stream. After the detection step, a classification method is applied to separate the program repeated content from breaks ones. The latter are used to segment the stream in Program/Breaks sequence. Finally, the segmented stream is aligned with the metadata provided with the stream such as the Electronic Program Guide (EPG) in order to provide labeled programs. Experimentations are made on 22- day long TV stream that show the effectiveness of our method

    Information theory-based shot cut/fade detection and video summarization

    Full text link

    A new audio-visual analysis approach and tools for parsing colonoscopy videos

    Get PDF
    Colonoscopy is an important screening tool for colorectal cancer. During a colonoscopic procedure, a tiny video camera at the tip of the endoscope generates a video signal of the internal mucosa of the colon. The video data are displayed on a monitor for real-time analysis by the endoscopist. We call videos captured from colonoscopic procedures colonoscopy videos. Because these videos possess unique characteristics, new types of semantic units and parsing techniques are required. In this paper, we introduce a new analysis approach that includes (a) a new definition of semantic unit - scene (a segment of visual and audio data that correspond to an endoscopic segment of the colon); (b) a novel scene segmentation algorithm using audio and visual analysis to recognize scene boundaries. We design a prototype system to implement the proposed approach. This system also provides the tools for video/image browsing. The tools enable the users to quickly locate and browse scenes of interest. Experiments on real colonoscopy videos show the effectiveness of our algorithms. The proposed techniques and software are useful (1) for post-procedure reviews, (2) for developing an effective content-based retrieval system for colonoscopy videos to facilitate endoscopic research and education, and (3) for development of a systematic approach to assess endoscopists\u27 procedural skills

    New enhancements to cut, fade, and dissolve detection processes in video segmentation

    Full text link
    We present improved algorithms for cut, fade, and dissolve detection which are fundamental steps in digital video analysis. In particular, we propose a new adaptive threshold determination method that is shown to reduce artifacts created by noise and motion in scene cut detection. We also describe new two-step algorithms for fade and dissolve detection, and introduce a method for eliminating false positives from a list of detected candidate transitions. In our detailed study of these gradual shot transitions, our objective has been to accurately classify the type of transitions (fade-in, fade-out, and dissolve) and to precisely locate the boundary of the transitions. This distinguishes our work from other early work in scene change detection which tends to focus primarily on identifying the existence of a transition rather than its precise temporal extent. We evaluate our improved algorithms against two other commonly used shot detection techniques on a comprehensive data set, and demonstrate the improved performance due to our enhancements

    Content-based video copy detection using multimodal analysis

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2009.Thesis (Master's) -- Bilkent University, 2009.Includes bibliographical references leaves 67-76.Huge and increasing amount of videos broadcast through networks has raised the need of automatic video copy detection for copyright protection. Recent developments in multimedia technology introduced content-based copy detection (CBCD) as a new research field alternative to the watermarking approach for identification of video sequences. This thesis presents a multimodal framework for matching video sequences using a three-step approach: First, a high-level face detector identifies facial frames/shots in a video clip. Matching faces with extended body regions gives the flexibility to discriminate the same person (e.g., an anchor man or a political leader) in different events or scenes. In the second step, a spatiotemporal sequence matching technique is employed to match video clips/segments that are similar in terms of activity. Finally the non-facial shots are matched using low-level visual features. In addition, we utilize fuzzy logic approach for extracting color histogram to detect shot boundaries of heavily manipulated video clips. Methods for detecting noise, frame-droppings, picture-in-picture transformation windows, and extracting mask for still regions are also proposed and evaluated. The proposed method was tested on the query and reference dataset of CBCD task of TRECVID 2008. Our results were compared with the results of top-8 most successful techniques submitted to this task. Experimental results show that the proposed method performs better than most of the state-of-the-art techniques, in terms of both effectiveness and efficiency.Küçüktunç, OnurM.S

    AVIDENSE: Advanced Video Analysis System for Colonoscopy Semantics

    Get PDF
    Colonoscopy is an important screening tool for colorectal cancer. During a colonoscopic procedure, a tiny video camera at the tip of the endoscope generates a video signal of the internal mucosa of the colon. The video data are displayed on a monitor for real-time analysis by the endoscopist. We call videos captured from colonoscopic procedures colonoscopy videos . To the best of our knowledge, they are not captured for post procedural review or analysis in the current practice. Because of the unique characteristics of colonoscopy videos, new types of semantic units and new image/video analyzing techniques are required. In this dissertation, we aim to develop new image/video analysis techniques for these videos to extract important semantic units, such as colonoscopic scenes, operation shots, and appendix images. Our contributions include two parts: (a) new definitions of semantic units (colonoscopic scene, operation shot, and appendix image); and (b) novel image/video analysis algorithms, including novel scene segmentation algorithms using audio and visual information to recognize scene boundaries, new computer-aided detection approaches for operation shot detection, and new image analysis methods for appendix image classification. The new image processing and content-based video analysis algorithms can be extended to videos from other endoscopic procedures, such as upper gastrointestinal endoscopy, EGD, enteroscopy, bronchoscopy, cystoscopy, and laparoscopy. Our research is very useful for the following platforms and resources: (a) platforms for new methods to discover unknown patterns of diseases and cancers; (b) platforms for improving and assessing endoscopists procedural skills; and (c) education resources for endoscopic research

    Un outil pour l'indexation des vidéos personnelles par le contenu

    Get PDF

    Unsupervised video indexing on audiovisual characterization of persons

    Get PDF
    Cette thèse consiste à proposer une méthode de caractérisation non-supervisée des intervenants dans les documents audiovisuels, en exploitant des données liées à leur apparence physique et à leur voix. De manière générale, les méthodes d'identification automatique, que ce soit en vidéo ou en audio, nécessitent une quantité importante de connaissances a priori sur le contenu. Dans ce travail, le but est d'étudier les deux modes de façon corrélée et d'exploiter leur propriété respective de manière collaborative et robuste, afin de produire un résultat fiable aussi indépendant que possible de toute connaissance a priori. Plus particulièrement, nous avons étudié les caractéristiques du flux audio et nous avons proposé plusieurs méthodes pour la segmentation et le regroupement en locuteurs que nous avons évaluées dans le cadre d'une campagne d'évaluation. Ensuite, nous avons mené une étude approfondie sur les descripteurs visuels (visage, costume) qui nous ont servis à proposer de nouvelles approches pour la détection, le suivi et le regroupement des personnes. Enfin, le travail s'est focalisé sur la fusion des données audio et vidéo en proposant une approche basée sur le calcul d'une matrice de cooccurrence qui nous a permis d'établir une association entre l'index audio et l'index vidéo et d'effectuer leur correction. Nous pouvons ainsi produire un modèle audiovisuel dynamique des intervenants.This thesis consists to propose a method for an unsupervised characterization of persons within audiovisual documents, by exploring the data related for their physical appearance and their voice. From a general manner, the automatic recognition methods, either in video or audio, need a huge amount of a priori knowledge about their content. In this work, the goal is to study the two modes in a correlated way and to explore their properties in a collaborative and robust way, in order to produce a reliable result as independent as possible from any a priori knowledge. More particularly, we have studied the characteristics of the audio stream and we have proposed many methods for speaker segmentation and clustering and that we have evaluated in a french competition. Then, we have carried a deep study on visual descriptors (face, clothing) that helped us to propose novel approches for detecting, tracking, and clustering of people within the document. Finally, the work was focused on the audiovisual fusion by proposing a method based on computing the cooccurrence matrix that allowed us to establish an association between audio and video indexes, and to correct them. That will enable us to produce a dynamic audiovisual model for each speaker

    Using Web Archives to Enrich the Live Web Experience Through Storytelling

    Get PDF
    Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, storytelling is becoming a popular technique in social media for selecting Web resources to support a particular narrative or story . In this dissertation, we address the problem of understanding the archived collections through proposing the Dark and Stormy Archive (DSA) framework, in which we integrate storytelling social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. To inform our work of generating stories from archived collections, we start by building a baseline for the structural characteristics of popular (i.e., receiving the most views) human-generated stories through investigating stories from Storify. Furthermore, we checked the entire population of Archive-It collections for better understanding the characteristics of the collections we intend to summarize. We then filter off-topic pages from the collections the using different methods to detect when an archived page in a collection has gone off-topic. We created a gold standard dataset from three Archive-It collections to evaluate the proposed methods at different thresholds. From the gold standard dataset, we identified five behaviors for the TimeMaps (a list of archived copies of a page) based on the page’s aboutness. Based on a dynamic slicing algorithm, we divide the collection and cluster the pages in each slice. We then select the best representative page from each cluster based on different quality metrics (e.g., the replay quality, and the quality of the generated snippet from the page). At the end, we put the selected pages in chronological order and visualize them using Storify. For evaluating the DSA framework, we obtained a ground truth dataset of hand-crafted stories from Archive-It collections generated by expert archivists. We used Amazon’s Mechanical Turk to evaluate the automatically generated stories against the stories that were created by domain experts. The results show that the automatically generated stories by the DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated storie
    corecore