104 research outputs found

    A novel illumination compensation scheme for sprite coding

    Get PDF
    Author name used in this publication: Dagan FengCentre for Multimedia Signal Processing, Department of Electronic and Information EngineeringRefereed conference paper2004-2005 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

    Automatic video segmentation employing object/camera modeling techniques

    Get PDF
    Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

    Advances in Spacecraft Systems and Orbit Determination

    Get PDF
    "Advances in Spacecraft Systems and Orbit Determinations", discusses the development of new technologies and the limitations of the present technology, used for interplanetary missions. Various experts have contributed to develop the bridge between present limitations and technology growth to overcome the limitations. Key features of this book inform us about the orbit determination techniques based on a smooth research based on astrophysics. The book also provides a detailed overview on Spacecraft Systems including reliability of low-cost AOCS, sliding mode controlling and a new view on attitude controller design based on sliding mode, with thrusters. It also provides a technological roadmap for HVAC optimization. The book also gives an excellent overview of resolving the difficulties for interplanetary missions with the comparison of present technologies and new advancements. Overall, this will be very much interesting book to explore the roadmap of technological growth in spacecraft systems

    Analyzing Structured Scenarios by Tracking People and Their Limbs

    Get PDF
    The analysis of human activities is a fundamental problem in computer vision. Though complex, interactions between people and their environment often exhibit a spatio-temporal structure that can be exploited during analysis. This structure can be leveraged to mitigate the effects of missing or noisy visual observations caused, for example, by sensor noise, inaccurate models, or occlusion. Trajectories of people and their hands and feet, often sufficient for recognition of human activities, lead to a natural qualitative spatio-temporal description of these interactions. This work introduces the following contributions to the task of human activity understanding: 1) a framework that efficiently detects and tracks multiple interacting people and their limbs, 2) an event recognition approach that integrates both logical and probabilistic reasoning in analyzing the spatio-temporal structure of multi-agent scenarios, and 3) an effective computational model of the visibility constraints imposed on humans as they navigate through their environment. The tracking framework mixes probabilistic models with deterministic constraints and uses AND/OR search and lazy evaluation to efficiently obtain the globally optimal solution in each frame. Our high-level reasoning framework efficiently and robustly interprets noisy visual observations to deduce the events comprising structured scenarios. This is accomplished by combining First-Order Logic, Allen's Interval Logic, and Markov Logic Networks with an event hypothesis generation process that reduces the size of the ground Markov network. When applied to outdoor one-on-one basketball videos, our framework tracks the players and, guided by the game rules, analyzes their interactions with each other and the ball, annotating the videos with the relevant basketball events that occurred. Finally, motivated by studies of spatial behavior, we use a set of features from visibility analysis to represent spatial context in the interpretation of human spatial activities. We demonstrate the effectiveness of our representation on trajectories generated by humans in a virtual environment

    NEW CHANGE DETECTION MODELS FOR OBJECT-BASED ENCODING OF PATIENT MONITORING VIDEO

    Get PDF
    The goal of this thesis is to find a highly efficient algorithm to compress patient monitoring video. This type of video mainly contains local motions and a large percentage of idle periods. To specifically utilize these features, we present an object-based approach, which decomposes input video into three objects representing background, slow-motion foreground and fast-motion foreground. Encoding these three video objects with different temporal scalabilities significantly improves the coding efficiency in terms of bitrate vs. visual quality. The video decomposition is built upon change detection which identifies content changes between video frames. To improve the robustness of capturing small changes, we contribute two new change detection models. The model built upon Markov random theory discriminates foreground containing the patient being monitored. The other model, called covariance test method, identifies constantly changing content by exploiting temporal correlation in multiple video frames. Both models show great effectiveness in constructing the defined video objects. We present detailed algorithms of video object construction, as well as experimental results on the object-based coding of patient monitoring video

    Super-resolution:A comprehensive survey

    Get PDF

    Morphologie de sprites et conditions de productions de sprites et de jets dans les systèmes orageux de méso-échelle

    Get PDF
    Ce document décrit l'analyse des conditions de production de phénomènes lumineux transitoires dans la mésosphère, produits en réponse à des décharges électriques énergétiques orageuses localisées au-dessous. Pendant les campagnes d'observation EuroSprite, quelques centaines d'images de sprites ont été obtenues, fournissant des informations sur la morphologie, la localisation et le moment de leur production. Des données issues de radars météorologiques, de satellite météosat, de deux types de système de détection d'éclairs, et de récepteur radio large bande ont été analysées. Des études de cas et une étude statistique sur un grand nombre de cas de sprites produits par 7 orages distincts sont réalisées. L'analyse porte sur le rôle de la composante intranuage des éclairs nuage-sol positifs à l'origine des sprites et notamment le lien avec leur morphologie, sur la relation avec le stade d'évolution des orages, et enfin sur les conditions associées à la production d'un jet géant aux Etats-Unis. Les sprites observés ont été produits par des systèmes convectifs de moyenne échelle (MCS) lorsque la partie stratiforme était en phase d'expansion. Les séquences des éclairs nuage-sol et l'activité intranuage observées au moment des sprites confirment une propagation horizontale importante (convective-vers-stratiforme). Les sprites de type colonne sont produits avec des délais plus courts que les sprites de type carotte. Plus le délai est court plus le nombre d'éléments est grand et plus leur luminosité est concentrée à une altitude élevée. Le jet géant semble avoir été favorisé par la configuration de charge et l'activité d'éclairs plutôt que l'altitude du sommet du nuage.This dissertation is devoted to the description of the conditions of production of transient luminous phenomena (sprites, jets, elves) in the mesosphere, which occur in response to energetic lightning discharges in thunderstorms underneath. During EuroSprite observation campaigns, a few hundred images of sprites have been obtained, providing information about event morphology, location and timing. Precipitation data from weather radar and cloud top altitude from Meteosat, as well as two lightning detection networks and a wide-band radio receiver have been analyzed. The methodology includes case studies and a statistical study over a large number of sprites produced by 7 different storms. The work focuses on the aspect of the intracloud lightning component associated with positive cloud-to-ground flashes, the link with the morphology of sprites, and the life cycle of thunderstorm systems. Additionally, a storm which produced a rare gigantic jet observed in the United States is analyzed in detail. The observed sprites were produced by mesoscale convective systems (MCS) during the expanding phase of the stratiform region. The cloud-to-ground flash sequences and the intracloud lightning component observed at the time of sprites confirm a large horizontal convective-to-stratiform propagation, as mechanism of charge collection, explaining displaced sprites. Sprites of column-type are produced with shorter delays than carrot sprites, and the shorter the delay, the more elements, their luminosity concentrating at greater altitudes. The gigantic jet appears to have been promoted by a certain charge configuration and lightning activity pattern, rather than a high cloud top altitude

    Video object extraction in distributed surveillance systems

    Get PDF
    Recently, automated video surveillance and related video processing algorithms have received considerable attention from the research community. Challenges in video surveillance rise from noise, illumination changes, camera motion, splits and occlusions, complex human behavior, and how to manage extracted surveillance information for delivery, archiving, and retrieval: Many video surveillance systems focus on video object extraction, while few focus on both the system architecture and video object extraction. We focus on both and integrate them to produce an end-to-end system and study the challenges associated with building this system. We propose a scalable, distributed, and real-time video-surveillance system with a novel architecture, indexing, and retrieval. The system consists of three modules: video workstations for processing, control workstations for monitoring, and a server for management and archiving. The proposed system models object features as temporal Gaussians and produces: an 18 frames/second frame-rate for SIF video and static cameras, reduced network and storage usage, and precise retrieval results. It is more scalable and delivers more balanced distributed performance than recent architectures. The first stage of video processing is noise estimation. We propose a method for localizing homogeneity and estimating the additive white Gaussian noise variance, which uses spatially scattered initial seeds and utilizes particle filtering techniques to guide their spatial movement towards homogeneous locations from which the estimation is performed. The noise estimation method reduces the number of measurements required by block-based methods while achieving more accuracy. Next, we segment video objects using a background subtraction technique. We generate the background model online for static cameras using a mixture of Gaussians background maintenance approach. For moving cameras, we use a global motion estimation method offline to bring neighboring frames into the coordinate system of the current frame and we merge them to produce the background model. We track detected objects using a feature-based object tracking method with improved detection and correction of occlusion and split. We detect occlusion and split through the identification of sudden variations in the spatia-temporal features of objects. To detect splits, we analyze the temporal behavior of split objects to discriminate between errors in segmentation and real separation of objects. Both objective and subjective experimental results show the ability of the proposed algorithm to detect and correct both splits and occlusions of objects. For the last stage of video processing, we propose a novel method for the detection of vandalism events which is based on a proposed definition for vandal behaviors recorded on surveillance video sequences. We monitor changes inside a restricted site containing vandalism-prone objects and declare vandalism when an object is detected as leaving the site while there is temporally consistent and significant static changes representing damage, given that the site is normally unchanged after use. The proposed method is tested on sequences showing real and simulated vandal behaviors and it achieves a detection rate of 96%. It detects different forms of vandalism such as graffiti and theft. The proposed end-ta-end video surveillance system aims at realizing the potential of video object extraction in automated surveillance and retrieval by focusing on both video object extraction and the management, delivery, and utilization of the extracted informatio

    Segmentation of motion picture images and image sequences

    Get PDF

    Analysis and Synthesis of Interactive Video Sprites

    Get PDF
    In this thesis, we explore how video, an extremely compelling medium that is traditionally consumed passively, can be transformed into interactive experiences and what is preventing content creators from using it for this purpose. Film captures extremely rich and dynamic information but, due to the sheer amount of data and the drastic change in content appearance over time, it is problematic to work with. Content creators are willing to invest time and effort to design and capture video so why not for manipulating and interacting with it? We hypothesize that people can help and be helped by automatic video processing and synthesis algorithms when they are given the right tools. Computer games are a very popular interactive media where players engage with dynamic content in compelling and intuitive ways. The first contribution of this thesis is an in-depth exploration of the modes of interaction that enable game-like video experiences. Through active discussions with game developers, we identify both how to assist content creators and how their creation can be dynamically interacted with by players. We present concepts, explore algorithms and design tools that together enable interactive video experiences. Our findings concerning processing videos and interacting with filmed content come together in this thesis' second major contribution. We present a new medium of expression where video elements can be looped, merged and triggered interactively. Static-camera videos are converted into loopable sequences that can be controlled in real time in response to simple end-user requests. We present novel algorithms and interactive tools that enable our new medium of expression. Our human-in-the-loop system gives the user progressively more creative control over the video content as they invest more effort and artists help us evaluate it. Monocular, static-camera videos are a good fit for looping algorithms but they have been limited to two-dimensional applications as pixels are reshuffled in space and time on the image plane. The final contribution of this thesis breaks through this barrier by allowing users to interact with filmed objects in a three-dimensional manner. Our novel object tracking algorithm extends existing 2D bounding box trackers with 3D information, such as a well-fitting bounding volume, which in turn enables a new breed of interactive video experiences. The filmed content becomes a three-dimensional playground as users are free to move the virtual camera or the tracked objects and see them from novel viewpoints
    corecore