19 research outputs found

    Automatic video segmentation employing object/camera modeling techniques

    Get PDF
    Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary

    Towards visualization and searching :a dual-purpose video coding approach

    Get PDF
    In modern video applications, the role of the decoded video is much more than filling a screen for visualization. To offer powerful video-enabled applications, it is increasingly critical not only to visualize the decoded video but also to provide efficient searching capabilities for similar content. Video surveillance and personal communication applications are critical examples of these dual visualization and searching requirements. However, current video coding solutions are strongly biased towards the visualization needs. In this context, the goal of this work is to propose a dual-purpose video coding solution targeting both visualization and searching needs by adopting a hybrid coding framework where the usual pixel-based coding approach is combined with a novel feature-based coding approach. In this novel dual-purpose video coding solution, some frames are coded using a set of keypoint matches, which not only allow decoding for visualization, but also provide the decoder valuable feature-related information, extracted at the encoder from the original frames, instrumental for efficient searching. The proposed solution is based on a flexible joint Lagrangian optimization framework where pixel-based and feature-based processing are combined to find the most appropriate trade-off between the visualization and searching performances. Extensive experimental results for the assessment of the proposed dual-purpose video coding solution under meaningful test conditions are presented. The results show the flexibility of the proposed coding solution to achieve different optimization trade-offs, notably competitive performance regarding the state-of-the-art HEVC standard both in terms of visualization and searching performance.Em modernas aplicações de vídeo, o papel do vídeo decodificado é muito mais que simplesmente preencher uma tela para visualização. Para oferecer aplicações mais poderosas por meio de sinais de vídeo,é cada vez mais crítico não apenas considerar a qualidade do conteúdo objetivando sua visualização, mas também possibilitar meios de realizar busca por conteúdos semelhantes. Requisitos de visualização e de busca são considerados, por exemplo, em modernas aplicações de vídeo vigilância e comunicações pessoais. No entanto, as atuais soluções de codificação de vídeo são fortemente voltadas aos requisitos de visualização. Nesse contexto, o objetivo deste trabalho é propor uma solução de codificação de vídeo de propósito duplo, objetivando tanto requisitos de visualização quanto de busca. Para isso, é proposto um arcabouço de codificação em que a abordagem usual de codificação de pixels é combinada com uma nova abordagem de codificação baseada em features visuais. Nessa solução, alguns quadros são codificados usando um conjunto de pares de keypoints casados, possibilitando não apenas visualização, mas também provendo ao decodificador valiosas informações de features visuais, extraídas no codificador a partir do conteúdo original, que são instrumentais em aplicações de busca. A solução proposta emprega um esquema flexível de otimização Lagrangiana onde o processamento baseado em pixel é combinado com o processamento baseado em features visuais objetivando encontrar um compromisso adequado entre os desempenhos de visualização e de busca. Os resultados experimentais mostram a flexibilidade da solução proposta em alcançar diferentes compromissos de otimização, nomeadamente desempenho competitivo em relação ao padrão HEVC tanto em termos de visualização quanto de busca

    Multiresolution image models and estimation techniques

    Get PDF

    Image Mosaicing and Super-resolution

    Full text link

    From Image-based Motion Analysis to Free-Viewpoint Video

    Get PDF
    The problems of capturing real-world scenes with cameras and automatically analyzing the visible motion have traditionally been in the focus of computer vision research. The photo-realistic rendition of dynamic real-world scenes, on the other hand, is a problem that has been investigated in the field of computer graphics. In this thesis, we demonstrate that the joint solution to all three of these problems enables the creation of powerful new tools that are benecial for both research disciplines. Analysis and rendition of real-world scenes with human actors are amongst the most challenging problems. In this thesis we present new algorithmic recipes to attack them. The dissertation consists of three parts: In part I, we present novel solutions to two fundamental problems of human motion analysis. Firstly, we demonstrate a novel hybrid approach for markerfree human motion capture from multiple video streams. Thereafter, a new algorithm for automatic non-intrusive estimation of kinematic body models of arbitrary moving subjects from video is detailed. In part II of the thesis, we demonstrate that a marker-free motion capture approach makes possible the model-based reconstruction of free-viewpoint videos of human actors from only a handful of video streams. The estimated 3D videos enable the photo-realistic real-time rendition of a dynamic scene from arbitrary novel viewpoints. Texture information from video is not only applied to generate a realistic surface appearance, but also to improve the precision of the motion estimation scheme. The commitment to a generic body model also allows us to reconstruct a time-varying reflectance description of an actor`s body surface which allows us to realistically render the free-viewpoint videos under arbitrary lighting conditions. A novel method to capture high-speed large scale motion using regular still cameras and the principle of multi-exposure photography is described in part III. The fundamental principles underlying the methods in this thesis are not only applicable to humans but to a much larger class of subjects. It is demonstrated that, in conjunction, our proposed algorithmic recipes serve as building blocks for the next generation of immersive 3D visual media.Die Entwicklung neuer Algorithmen zur optischen Erfassung und Analyse der Bewegung in dynamischen Szenen ist einer der Forschungsschwerpunkte in der computergestützten Bildverarbeitung. Während im maschinellen Bildverstehen das Augenmerk auf der Extraktion von Informationen liegt, konzentriert sich die Computergrafik auf das inverse Problem, die fotorealistische Darstellung bewegter Szenen. In jüngster Vergangenheit haben sich die beiden Disziplinen kontinuierlich angenähert, da es eine Vielzahl an herausfordernden wissenschaftlichen Fragestellungen gibt, die eine gemeinsame Lösung des Bilderfassungs-, des Bildanalyse- und des Bildsyntheseproblems verlangen. Zwei der schwierigsten Probleme, welche für Forscher aus beiden Disziplinen eine große Relevanz besitzen, sind die Analyse und die Synthese von dynamischen Szenen, in denen Menschen im Mittelpunkt stehen. Im Rahmen dieser Dissertation werden Verfahren vorgestellt, welche die optische Erfassung dieser Art von Szenen, die automatische Analyse der Bewegungen und die realistische neue Darstellung im Computer erlauben. Es wid deutlich werden, dass eine Integration von Algorithmen zur Lösung dieser drei Probleme in ein Gesamtsystem die Erzeugung völlig neuartiger dreidimensionaler Darstellungen von Menschen in Bewegung ermöglicht. Die Dissertation ist in drei Teile gegliedert: Teil I beginnt mit der Beschreibung des Entwurfs und des Baus eines Studios zur zeitsynchronen Erfassung mehrerer Videobildströme. Die im Studio aufgezeichneten Multivideosequenzen dienen als Eingabedaten für die im Rahmen dieser Dissertation entwickelten videogestützten Bewegunsanalyseverfahren und die Algorithmen zur Erzeugung dreidimensionaler Videos. Im Anschluß daran werden zwei neu entwickelte Verfahren vorgestellt, die Antworten auf zwei fundamentale Fragen in der optischen Erfassung menschlicher Bewegung geben, die Messung von Bewegungsparametern und die Erzeugung von kinematischen Skelettmodellen. Das erste Verfahren ist ein hybrider Algorithmus zur markierungslosen optischen Messung von Bewegunsgparametern aus Multivideodaten. Der Verzicht auf optische Markierungen wird dadurch ermöglicht, dass zur Bewegungsanalyse sowohl aus den Bilddaten rekonstruierte Volumenmodelle als auch leicht zu erfassende Körpermerkmale verwendet werden. Das zweite Verfahren dient der automatischen Rekonstruktion eines kinematischen Skelettmodells anhand von Multivideodaten. Der Algorithmus benötigt weder optischen Markierungen in der Szene noch a priori Informationen über die Körperstruktur, und ist in gleicher Form auf Menschen, Tiere und Objekte anwendbar. Das Thema das zweiten Teils dieser Arbeit ist ein modellbasiertes Verfahrenzur Rekonstruktion dreidimensionaler Videos von Menschen in Bewegung aus nur wenigen zeitsynchronen Videoströmen. Der Betrachter kann die errechneten 3D Videos auf einem Computer in Echtzeit abspielen und dabei interaktiv einen beliebigen virtuellen Blickpunkt auf die Geschehnisse einnehmen. Im Zentrum unseres Ansatzes steht ein silhouettenbasierter Analyse-durch-Synthese Algorithmus, der es ermöglicht, ohne optische Markierungen sowohl die Form als auch die Bewegung eines Menschen zu erfassen. Durch die Berechnung zeitveränderlicher Oberächentexturen aus den Videodaten ist gewährleistet, dass eine Person aus jedem beliebigen Blickwinkel ein fotorealistisches Erscheinungsbild besitzt. In einer ersten algorithmischen Erweiterung wird gezeigt, dass die Texturinformation auch zur Verbesserung der Genauigkeit der Bewegunsgssch ätzung eingesetzt werden kann. Zudem ist es durch die Verwendung eines generischen Körpermodells möglich, nicht nur dynamische Texturen sondern sogar dynamische Reektionseigenschaften der Körperoberäche zu messen. Unser Reektionsmodell besteht aus einer parametrischen BRDF für jeden Texel und einer dynamischen Normalenkarte für die gesamte Körperoberäche. Auf diese Weise können 3D Videos auch unter völlig neuen simulierten Beleuchtungsbedingungen realistisch wiedergegeben werden. Teil III dieser Arbeit beschreibt ein neuartiges Verfahren zur optischen Messung sehr schneller Bewegungen. Bisher erforderten optische Aufnahmen von Hochgeschwindigkeitsbewegungen sehr teure Spezialkameras mit hohen Bildraten. Im Gegensatz dazu verwendet die hier beschriebene Methode einfache Digitalfotokameras und das Prinzip der Multiblitzfotograe. Es wird gezeigt, dass mit Hilfe dieses Verfahrens sowohl die sehr schnelle artikulierte Handbewegung des Werfers als auch die Flugparameter des Balls während eines Baseballpitches gemessen werden können. Die hochgenau erfaßten Parameter ermöglichen es, die gemessene Bewegung in völlig neuer Weise im Computer zu visualisieren. Obgleich die in dieser Dissertation vorgestellten Verfahren vornehmlich der Analyse und Darstellung menschlicher Bewegungen dienen, sind die grundlegenden Prinzipien auch auf viele anderen Szenen anwendbar. Jeder der beschriebenen Algorithmen löst zwar in erster Linie ein bestimmtes Teilproblem, aber in Ihrer Gesamtheit können die Verfahren als Bausteine verstanden werden, welche die nächste Generation interaktiver dreidimensionaler Medien ermöglichen werden

    Biological image analysis

    Get PDF
    In biological research images are extensively used to monitor growth, dynamics and changes in biological specimen, such as cells or plants. Many of these images are used solely for observation or are manually annotated by an expert. In this dissertation we discuss several methods to automate the annotating and analysis of bio-images. Two large clusters of methods have been investigated and developed. A first set of methods focuses on the automatic delineation of relevant objects in bio-images, such as individual cells in microscopic images. Since these methods should be useful for many different applications, e.g. to detect and delineate different objects (cells, plants, leafs, ...) in different types of images (different types of microscopes, regular colour photographs, ...), the methods should be easy to adjust. Therefore we developed a methodology relying on probability theory, where all required parameters can easily be estimated by a biologist, without requiring any knowledge on the techniques used in the actual software. A second cluster of investigated techniques focuses on the analysis of shapes. By defining new features that describe shapes, we are able to automatically classify shapes, retrieve similar shapes from a database and even analyse how an object deforms through time

    Visualization Techniques in Space and Atmospheric Sciences

    Get PDF
    Unprecedented volumes of data will be generated by research programs that investigate the Earth as a system and the origin of the universe, which will in turn require analysis and interpretation that will lead to meaningful scientific insight. Providing a widely distributed research community with the ability to access, manipulate, analyze, and visualize these complex, multidimensional data sets depends on a wide range of computer science and technology topics. Data storage and compression, data base management, computational methods and algorithms, artificial intelligence, telecommunications, and high-resolution display are just a few of the topics addressed. A unifying theme throughout the papers with regards to advanced data handling and visualization is the need for interactivity, speed, user-friendliness, and extensibility

    Transform domain texture synthesis on surfaces

    Get PDF
    In the recent past application areas such as virtual reality experiences, digital cinema and computer gamings have resulted in a renewed interest in advanced research topics in computer graphics. Although many research challenges in computer graphics have been met due to worldwide efforts, many more are yet to be met. Two key challenges which still remain open research problems are, the lack of perfect realism in animated/virtually-created objects when represented in graphical format and the need for the transmissiim/storage/exchange of a massive amount of information in between remote locations, when 3D computer generated objects are used in remote visualisations. These challenges call for further research to be focused in the above directions. Though a significant amount of ideas have been proposed by the international research community in their effort to meet the above challenges, the ideas still suffer from excessive complexity related issues resulting in high processing times and their practical inapplicability when bandwidth constraint transmission mediums are used or when the storage space or computational power of the display device is limited. In the proposed work we investigate the appropriate use of geometric representations of 3D structure (e.g. Bezier surface, NURBS, polygons) and multi-resolution, progressive representation of texture on such surfaces. This joint approach to texture synthesis has not been considered before and has significant potential in resolving current challenges in virtual realism, digital cinema and computer gaming industry. The main focus of the novel approaches that are proposed in this thesis is performing photo-realistic texture synthesis on surfaces. We have provided experimental results and detailed analysis to prove that the proposed algorithms allow fast, progressive building of texture on arbitrarily shaped 3D surfaces. In particular we investigate the above ideas in association with Bezier patch representation of 3D objects, an approach which has not been considered so far by any published world wide research effort, yet has flexibility of utmost practical importance. Further we have discussed the novel application domains that can be served by the inclusion of additional functionality within the proposed algorithms.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Cinema Server = s/t (story over time) : an interface for interactive motion picture design

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1993.Includes bibliographical references (leaves 146-148).by Stephan J. Fitch.M.S
    corecore