8 research outputs found

    An Overview of Video Shot Clustering and Summarization Techniques for Mobile Applications

    Get PDF
    The problem of content characterization of video programmes is of great interest because video appeals to large audiences and its efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper we analyze several techniques proposed in literature for content characterization of video programmes, including movies and sports, that could be helpful for mobile media consumption. In particular we focus our analysis on shot clustering methods and effective video summarization techniques since, in the current video analysis scenario, they facilitate the access to the content and help in quick understanding of the associated semantics. First we consider the shot clustering techniques based on low-level features, using visual, audio and motion information, even combined in a multi-modal fashion. Then we concentrate on summarization techniques, such as static storyboards, dynamic video skimming and the extraction of sport highlights. Discussed summarization methods can be employed in the development of tools that would be greatly useful to most mobile users: in fact these algorithms automatically shorten the original video while preserving most events by highlighting only the important content. The effectiveness of each approach has been analyzed, showing that it mainly depends on the kind of video programme it relates to, and the type of summary or highlights we are focusing on

    Automatic non-linear video editing for home video collections

    Get PDF
    The video editing process consists of deciding what elements to retain, delete, or combine from various video sources so that they come together in an organized, logical, and visually pleasing manner. Before the digital era, non-linear editing involved the arduous process of physically cutting and splicing video tapes, and was restricted to the movie industry and a few video enthusiasts. Today, when digital cameras and camcorders have made large personal video collections commonplace, non-linear video editing has gained renewed importance and relevance. Almost all available video editing systems today are dependent on considerable user interaction to produce coherent edited videos. In this work, we describe an automatic non-linear video editing system for generating coherent movies from a collection of unedited personal videos. Our thesis is that computing image-level visual similarity in an appropriate manner forms a good basis for automatic non-linear video editing. To our knowledge, this is a novel approach to solving this problem. The generation of output video from the system is guided by one or more input keyframes from the user, which guide the content of the output video. The output video is generated in a manner such that it is non-repetitive and follows the dynamics of the input videos. When no input keyframes are provided, our system generates "video textures" with the content of the output chosen at random. Our system demonstrates promising results on large video collections and is a first step towards increased automation in non-linear video editin

    The role of terminology and local grammar in video annotation

    Get PDF
    The linguistic annotation' of video sequences is an intellectually challenging task involving the investigation of how images and words are linked .together, a task that is ultimately financially rewarding in that the eventual automatic retrieval of video (sequences) can be much less time consuming, subjective and expensive than when retrieved manually. Much effort has been focused on automatic or semi-automatic annotation. Computational linguistic methods of video annotation rely on collections of collateral text in the form of keywords and proper nouns. Keywords are often used in a particular order indicating an identifiable pattern which is often limited and can subsequently be used to annotate the portion of a video where such a pattern occurred. Once' the relevant keywords and patterns have been stored, they can then be used to annotate the remainder of the video, excluding all collateral text which does not match the keywords or patterns. A new method of video annotation is presented in this thesis. The method facilitates a) annotation extraction of specialist terms within a corpus of collateral text; b) annotation identification of frequently used linguistic patterns to use in repeating key events within the data-set. The use of the method has led to the development of a system that can automatically assign key words and key patterns to a number of frames that are found in the commentary text approximately contemporaneous to the selected number of frames. The system does not perform video analysis; it only analyses the collateral text. The method is based on corpus linguistics and is mainly frequency based - frequency of occurrence of a key word or key pattern is taken as the basis of its representation. No assumptions are made about the grammatical structure of the language used in the collateral text, neither is a lexica of key words refined. Our system has been designed to annotate videos of football matches in English a!ld Arabic, and also cricket videos in English. The system has also been designed to retrieve annotated clips. The system not only provides a simple search method for annotated clips retrieval, it also provides complex, more advanced search methods.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Blur perception: An evaluation of focus measures

    Get PDF
    Since the middle of the 20th century the technological development of conventional photographic cameras has taken advantage of the advances in electronics and signal processing. One speci c area that has bene ted from these developments is that of auto-focus, the ability for a cameras optical arrangement to be altered so as to ensure the subject of the scene is in focus. However, whilst the precise focus point can be known for a single point in a scene, the method for selecting a best focus for the entire scene is an unsolved problem. Many focus algorithms have been proposed and compared, though no overall comparison between all algorithms has been made, nor have the results been compared with human observers. This work describes a methodology that was developed to benchmark focus algorithms against human results. Experiments that capture quantitative metrics about human observers were developed and conducted with a large set of observers on a diverse range of equipment. From these experiments, it was found that humans were highly consensual in their experimental responses. The human results were then used as a benchmark, against which equivalent experiments were performed by each of the candidate focus algorithms. A second set of experiments, conducted in a controlled environment, captured the underlying human psychophysical blur discrimination thresholds in natural scenes. The resultant thresholds were then characterised and compared against equivalent discrimination thresholds obtained by using the candidate focus algorithms as automated observers. The results of this comparison and how this should guide the selection of an auto-focus algorithm are discussed, with comment being passed on how focus algorithms may need to change to cope with future imaging techniques

    ComputergestĂĽtzte Inhaltsanalyse von digitalen Videoarchiven

    Full text link
    Der Übergang von analogen zu digitalen Videos hat in den letzten Jahren zu großen Veränderungen innerhalb der Filmarchive geführt. Insbesondere durch die Digitalisierung der Filme ergeben sich neue Möglichkeiten für die Archive. Eine Abnutzung oder Alterung der Filmrollen ist ausgeschlossen, so dass die Qualität unverändert erhalten bleibt. Zudem wird ein netzbasierter und somit deutlich einfacherer Zugriff auf die Videos in den Archiven möglich. Zusätzliche Dienste stehen den Archivaren und Anwendern zur Verfügung, die erweiterte Suchmöglichkeiten bereitstellen und die Navigation bei der Wiedergabe erleichtern. Die Suche innerhalb der Videoarchive erfolgt mit Hilfe von Metadaten, die weitere Informationen über die Videos zur Verfügung stellen. Ein großer Teil der Metadaten wird manuell von Archivaren eingegeben, was mit einem großen Zeitaufwand und hohen Kosten verbunden ist. Durch die computergestützte Analyse eines digitalen Videos ist es möglich, den Aufwand bei der Erzeugung von Metadaten für Videoarchive zu reduzieren. Im ersten Teil dieser Dissertation werden neue Verfahren vorgestellt, um wichtige semantische Inhalte der Videos zu erkennen. Insbesondere werden neu entwickelte Algorithmen zur Erkennung von Schnitten, der Analyse der Kamerabewegung, der Segmentierung und Klassifikation von Objekten, der Texterkennung und der Gesichtserkennung vorgestellt. Die automatisch ermittelten semantischen Informationen sind sehr wertvoll, da sie die Arbeit mit digitalen Videoarchiven erleichtern. Die Informationen unterstützen nicht nur die Suche in den Archiven, sondern führen auch zur Entwicklung neuer Anwendungen, die im zweiten Teil der Dissertation vorgestellt werden. Beispielsweise können computergenerierte Zusammenfassungen von Videos erzeugt oder Videos automatisch an die Eigenschaften eines Abspielgerätes angepasst werden. Ein weiterer Schwerpunkt dieser Dissertation liegt in der Analyse historischer Filme. Vier europäische Filmarchive haben eine große Anzahl historischer Videodokumentationen zur Verfügung gestellt, welche Anfang bis Mitte des letzten Jahrhunderts gedreht und in den letzten Jahren digitalisiert wurden. Durch die Lagerung und Abnutzung der Filmrollen über mehrere Jahrzehnte sind viele Videos stark verrauscht und enthalten deutlich sichtbare Bildfehler. Die Bildqualität der historischen Schwarz-Weiß-Filme unterscheidet sich signifikant von der Qualität aktueller Videos, so dass eine verlässliche Analyse mit bestehenden Verfahren häufig nicht möglich ist. Im Rahmen dieser Dissertation werden neue Algorithmen vorgestellt, um eine zuverlässige Erkennung von semantischen Inhalten auch in historischen Videos zu ermöglichen
    corecore