46 research outputs found

    Cubic-panorama image dataset analysis for storage and transmission

    Full text link

    Multiperspective mosaics and layered representation for scene visualization

    Get PDF
    This thesis documents the efforts made to implement multiperspective mosaicking for the purpose of mosaicking undervehicle and roadside sequences. For the undervehicle sequences, it is desired to create a large, high-resolution mosaic that may used to quickly inspect the entire scene shot by a camera making a single pass underneath the vehicle. Several constraints are placed on the video data, in order to facilitate the assumption that the entire scene in the sequence exists on a single plane. Therefore, a single mosaic is used to represent a single video sequence. Phase correlation is used to perform motion analysis in this case. For roadside video sequences, it is assumed that the scene is composed of several planar layers, as opposed to a single plane. Layer extraction techniques are implemented in order to perform this decomposition. Instead of using phase correlation to perform motion analysis, the Lucas-Kanade motion tracking algorithm is used in order to create dense motion maps. Using these motion maps, spatial support for each layer is determined based on a pre-initialized layer model. By separating the pixels in the scene into motion-specific layers, it is possible to sample each element in the scene correctly while performing multiperspective mosaicking. It is also possible to fill in many gaps in the mosaics caused by occlusions, hence creating more complete representations of the objects of interest. The results are several mosaics with each mosaic representing a single planar layer of the scene

    Editable View Optimized Tone Mapping For Viewing High Dynamic Range Panoramas On Head Mounted Display

    Get PDF
    Head mounted displays are characterized by relatively low resolution and low dynamic range. These limitations significantly reduce the visual quality of photo-realistic captures on such displays. This thesis presents an interactive view optimized tone mapping technique for viewing large sized high dynamic range panoramas up to 16384 by 8192 on head mounted displays. This technique generates a separate file storing pre-computed view-adjusted mapping function parameters. We define this technique as ToneTexture. The use of a view adjusted tone mapping allows for expansion of the perceived color space available to the end user. This yields an improved visual appearance of both high dynamic range panoramas and low dynamic range panoramas on such displays. Moreover, by providing proper interface to manipulate on ToneTexture, users are allowed to adjust the mapping function as to changing color emphasis. The authors present comparisons of the results produced by ToneTexture technique against widely-used Reinhard tone mapping operator and Filmic tone mapping operator both objectively via a mathematical quality assessment metrics and subjectively through user study. Demonstration systems are available for desktop and head mounted displays such as Oculus Rift and GearVR

    Doctor of Philosophy

    Get PDF
    dissertationInteractive editing and manipulation of digital media is a fundamental component in digital content creation. One media in particular, digital imagery, has seen a recent increase in popularity of its large or even massive image formats. Unfortunately, current systems and techniques are rarely concerned with scalability or usability with these large images. Moreover, processing massive (or even large) imagery is assumed to be an off-line, automatic process, although many problems associated with these datasets require human intervention for high quality results. This dissertation details how to design interactive image techniques that scale. In particular, massive imagery is typically constructed as a seamless mosaic of many smaller images. The focus of this work is the creation of new technologies to enable user interaction in the formation of these large mosaics. While an interactive system for all stages of the mosaic creation pipeline is a long-term research goal, this dissertation concentrates on the last phase of the mosaic creation pipeline - the composition of registered images into a seamless composite. The work detailed in this dissertation provides the technologies to fully realize interactive editing in mosaic composition on image collections ranging from the very small to massive in scale

    A vision system for mobile maritime surveillance platforms

    Get PDF
    Mobile surveillance systems play an important role to minimise security and safety threats in high-risk or hazardous environments. Providing a mobile marine surveillance platform with situational awareness of its environment is important for mission success. An essential part of situational awareness is the ability to detect and subsequently track potential target objects.Typically, the exact type of target objects is unknown, hence detection is addressed as a problem of finding parts of an image that stand out in relation to their surrounding regions or are atypical to the domain. Contrary to existing saliency methods, this thesis proposes the use of a domain specific visual attention approach for detecting potential regions of interest in maritime imagery. For this, low-level features that are indicative of maritime targets are identified. These features are then evaluated with respect to their local, regional, and global significance. Together with a domain specific background segmentation technique, the features are combined in a Bayesian classifier to direct visual attention to potential target objects.The maritime environment introduces challenges to the camera system: gusts, wind, swell, or waves can cause the platform to move drastically and unpredictably. Pan-tilt-zoom cameras that are often utilised for surveillance tasks can adjusting their orientation to provide a stable view onto the target. However, in rough maritime environments this requires high-speed and precise inputs. In contrast, omnidirectional cameras provide a full spherical view, which allows the acquisition and tracking of multiple targets at the same time. However, the target itself only occupies a small fraction of the overall view. This thesis proposes a novel, target-centric approach for image stabilisation. A virtual camera is extracted from the omnidirectional view for each target and is adjusted based on the measurements of an inertial measurement unit and an image feature tracker. The combination of these two techniques in a probabilistic framework allows for stabilisation of rotational and translational ego-motion. Furthermore, it has the specific advantage of being robust to loosely calibrated and synchronised hardware since the fusion of tracking and stabilisation means that tracking uncertainty can be used to compensate for errors in calibration and synchronisation. This then completely eliminates the need for tedious calibration phases and the adverse effects of assembly slippage over time.Finally, this thesis combines the visual attention and omnidirectional stabilisation frameworks and proposes a multi view tracking system that is capable of detecting potential target objects in the maritime domain. Although the visual attention framework performed well on the benchmark datasets, the evaluation on real-world maritime imagery produced a high number of false positives. An investigation reveals that the problem is that benchmark data sets are unconsciously being influenced by human shot selection, which greatly simplifies the problem of visual attention. Despite the number of false positives, the tracking approach itself is robust even if a high number of false positives are tracked

    An active vision system for tracking and mosaicking on UAV.

    Get PDF
    Lin, Kai Wun.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 120-127).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview of the UAV Project --- p.1Chapter 1.2 --- Challenges on Vision System for UAV --- p.2Chapter 1.3 --- Contributions of this Work --- p.4Chapter 1.4 --- Organization of Thesis --- p.6Chapter 2 --- Image Sensor Selection and Evaluation --- p.8Chapter 2.1 --- Image Sensor Overview --- p.8Chapter 2.1.1 --- Comparing Sensor Features and Performance --- p.9Chapter 2.1.2 --- Rolling Shutter vsGlobal Shutter --- p.10Chapter 2.2 --- Sensor Evaluation through USB Peripheral --- p.11Chapter 2.2.1 --- Interfacing Image Sensor and USB Controller --- p.12Chapter 2.2.2 --- Image Sensor Configuration --- p.14Chapter 2.3 --- Image Data Transmitting and Processing --- p.17Chapter 2.3.1 --- Data Transfer Mode and Buffering on USB Controller --- p.18Chapter 2.3.2 --- Demosaicking of Bayer Image Data --- p.20Chapter 2.4 --- Splitting Images and Exposure Problem --- p.22Chapter 2.4.1 --- Buffer Overflow on USB Controller --- p.22Chapter 2.4.2 --- Image Luminance and Exposure Adjustment --- p.24Chapter 3 --- Embedded System for Vision Processing --- p.26Chapter 3.1 --- Overview of the Embedded System --- p.26Chapter 3.1.1 --- TI OMAP3530 Processor --- p.27Chapter 3.1.2 --- Gumstix Overo Fire Computer-on-Module --- p.27Chapter 3.2 --- Interfacing Camera Module to the Embedded System --- p.28Chapter 3.2.1 --- Image Signal Processing Subsystem --- p.29Chapter 3.2.2 --- Camera Module Adapting Board --- p.30Chapter 3.2.3 --- Image Sensor Driver and Program Development --- p.31Chapter 3.3 --- View-stabilizing Biaxial Camera Platform --- p.34Chapter 3.3.1 --- The New Camera System iv --- p.35Chapter 3.3.2 --- View-stabilizing Pan-tilt Platform --- p.41Chapter 3.4 --- Overall System Architecture and UAV Integration --- p.46Chapter 4 --- Target Tracking and Geo-locating --- p.50Chapter 4.1 --- Camera Calibration --- p.51Chapter 4.1.1 --- The Perspective Camera Model --- p.51Chapter 4.1.2 --- Camera Lens Distortions --- p.53Chapter 4.1.3 --- Calibration Toolbox and Results --- p.54Chapter 4.2 --- Selection of Object Features and Trackers --- p.56Chapter 4.2.1 --- Harris Corner Detection --- p.58Chapter 4.2.2 --- Color Histogram --- p.59Chapter 4.2.3 --- KLT and Mean-shift Tracker --- p.59Chapter 4.3 --- Target Auto-centering --- p.64Chapter 4.3.1 --- Formulation of the PID Controller --- p.65Chapter 4.3.2 --- Control Gain Settings and Tuning --- p.69Chapter 4.4 --- Geo-locating of Tracked Target --- p.69Chapter 4.4.1 --- Coordinate Frame Transformation --- p.70Chapter 4.4.2 --- Depth Estimation and Target Locating --- p.74Chapter 4.5 --- Results and Discussion --- p.77Chapter 5 --- Real-time Aerial Mosaic Building --- p.89Chapter 5.1 --- Motion Model Selection --- p.90Chapter 5.1.1 --- Planar Perspective Motion Model --- p.90Chapter 5.2 --- Feature-based Image Alignment --- p.91Chapter 5.2.1 --- Image Preprocessing --- p.91Chapter 5.2.2 --- Feature Extraction and Matching --- p.92Chapter 5.2.3 --- Image Alignment using RANSAC Algorithm --- p.94Chapter 5.3 --- Image Composition --- p.95Chapter 5.3.1 --- Image Blending with Distance Map --- p.96Chapter 5.3.2 --- Overall Stitching Process --- p.98Chapter 5.4 --- Mosaic Simulation using Google Earth --- p.99Chapter 5.5 --- Results and Discussion --- p.100Chapter 6 --- Conclusion and Further Work --- p.108Chapter A --- System Schematics --- p.111Chapter B --- Image Sensor Sensitivity --- p.118Bibliography --- p.12

    Interactive Remote Collaboration Using Augmented Reality

    Get PDF
    With the widespread deployment of fast data connections and availability of a variety of sensors for different modalities, the potential of remote collaboration has greatly increased. While the now ubiquitous video conferencing applications take advantage of some of these capabilities, the use of video between remote users is limited to passively watching disjoint video feeds and provides no means for interaction with the remote environment. However, collaboration often involves sharing, exploring, referencing, or even manipulating the physical world, and thus tools should provide support for these interactions.We suggest that augmented reality is an intuitive and user-friendly paradigm to communicate information about the physical environment, and that integration of computer vision and augmented reality facilitates more immersive and more direct interaction with the remote environment than what is possible with today's tools.In this dissertation, we present contributions to realizing this vision on several levels. First, we describe a conceptual framework for unobtrusive mobile video-mediated communication in which the remote user can explore the live scene independent of the local user's current camera movement, and can communicate information by creating spatial annotations that are immediately visible to the local user in augmented reality. Second, we describe the design and implementation of several, increasingly more flexible and immersive user interfaces and system prototypes that implement this concept. Our systems do not require any preparation or instrumentation of the environment; instead, the physical scene is tracked and modeled incrementally using monocular computer vision. The emerging model then supports anchoring of annotations, virtual navigation, and synthesis of novel views of the scene. Third, we describe the design, execution and analysis of three user studies comparing our prototype implementations with more conventional interfaces and/or evaluating specific design elements. Study participants overwhelmingly preferred our technology, and their task performance was significantly better compared with a video-only interface, though no task performance difference was observed compared with a ``static marker'' interface. Last, we address a particular technical limitation of current monocular tracking and mapping systems which was found to be impeding and present a conceptual solution; namely, we describe a concept and proof-of-concept implementation for automatic model selection which allows tracking and modeling to cope with both parallax-inducing and rotation-only camera movements.We suggest that our results demonstrate the maturity and usability of our systems, and, more importantly, the potential of our approach to improve video-mediated communication and broaden its applicability

    Spatial displays for visual awareness of remote locations

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. [113]-116).uCom enables remote users to be visually aware of each other using "spatial displays" - live views of a remote space assembled according to an estimate of the remote space's layout. The main elements of the system design are a 3D representation of each space and a multi-display physical setup. The 3D image-based representation of a space is composed of an aggregate of live video feeds acquired from multiple viewpoints and rendered in a graphical visualization resembling a 3D collage. Its navigation controls allow users to transition among the remote views, while maintaining a sense of how the images relate in 3D space. Additionally, the system uses a configurable set of displays to portray always-on visual connections with a remote site integrated into the local physical environment. The evaluation investigates to what extent the system improves users' understanding of the layout of a remote space.by Ana Luisa de Araujo Santos.S.M

    Automatic video segmentation employing object/camera modeling techniques

    Get PDF
    Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research
    corecore