390 research outputs found

    Video anatomy : spatial-temporal video profile

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)A massive amount of videos are uploaded on video websites, smooth video browsing, editing, retrieval, and summarization are demanded. Most of the videos employ several types of camera operations for expanding field of view, emphasizing events, and expressing cinematic effect. To digest heterogeneous videos in video websites and databases, video clips are profiled to 2D image scroll containing both spatial and temporal information for video preview. The video profile is visually continuous, compact, scalable, and indexing to each frame. This work analyzes the camera kinematics including zoom, translation, and rotation, and categorize camera actions as their combinations. An automatic video summarization framework is proposed and developed. After conventional video clip segmentation and video segmentation for smooth camera operations, the global flow field under all camera actions has been investigated for profiling various types of video. A new algorithm has been designed to extract the major flow direction and convergence factor using condensed images. Then this work proposes a uniform scheme to segment video clips and sections, sample video volume across the major flow, compute flow convergence factor, in order to obtain an intrinsic scene space less influenced by the camera ego-motion. The motion blur technique has also been used to render dynamic targets in the profile. The resulting profile of video can be displayed in a video track to guide the access to video frames, help video editing, and facilitate the applications such as surveillance, visual archiving of environment, video retrieval, and online video preview

    Multiperspective mosaics and layered representation for scene visualization

    Get PDF
    This thesis documents the efforts made to implement multiperspective mosaicking for the purpose of mosaicking undervehicle and roadside sequences. For the undervehicle sequences, it is desired to create a large, high-resolution mosaic that may used to quickly inspect the entire scene shot by a camera making a single pass underneath the vehicle. Several constraints are placed on the video data, in order to facilitate the assumption that the entire scene in the sequence exists on a single plane. Therefore, a single mosaic is used to represent a single video sequence. Phase correlation is used to perform motion analysis in this case. For roadside video sequences, it is assumed that the scene is composed of several planar layers, as opposed to a single plane. Layer extraction techniques are implemented in order to perform this decomposition. Instead of using phase correlation to perform motion analysis, the Lucas-Kanade motion tracking algorithm is used in order to create dense motion maps. Using these motion maps, spatial support for each layer is determined based on a pre-initialized layer model. By separating the pixels in the scene into motion-specific layers, it is possible to sample each element in the scene correctly while performing multiperspective mosaicking. It is also possible to fill in many gaps in the mosaics caused by occlusions, hence creating more complete representations of the objects of interest. The results are several mosaics with each mosaic representing a single planar layer of the scene

    Widening the view angle of auto-multiscopic display, denoising low brightness light field data and 3D reconstruction with delicate details

    Get PDF
    This doctoral thesis will present the results of my work into widening the viewing angle of the auto-multiscopic display, denoising light filed data the enhancement of captured light filed data captured in low light circumstance, and the attempts on reconstructing the subject surface with delicate details from microscopy image sets. The automultiscopic displays carefully control the distribution of emitted light over space, direction (angle) and time so that even a static image displayed can encode parallax across viewing directions (light field). This allows simultaneous observation by multiple viewers, each perceiving 3D from their own (correct) perspective. Currently, the illusion can only be effectively maintained over a narrow range of viewing angles. We propose and analyze a simple solution to widen the range of viewing angles for automultiscopic displays that use parallax barriers. We insert a refractive medium, with a high refractive index, between the display and parallax barriers. The inserted medium warps the exitant lightfield in a way that increases the potential viewing angle. We analyze the consequences of this warp and build a prototype with a 93% increase in the effective viewing angle. Additionally, we developed an integral images synthesis method that can address the refraction introduced by the inserted medium efficiently without the use of ray tracing. Capturing light field image with a short exposure time is preferable for eliminating the motion blur but it also leads to low brightness in a low light environment, which results in a low signal noise ratio. Most light field denoising methods apply regular 2D image denoising method to the sub-aperture images of a 4D light field directly, but it is not suitable for focused light field data whose sub-aperture image resolution is too low to be applied regular denoising methods. Therefore, we propose a deep learning denoising method based on micro lens images of focused light field to denoise the depth map and the original micro lens image set simultaneously, and achieved high quality total focused images from the low focused light field data. In areas like digital museum, remote researching, 3D reconstruction with delicate details of subjects is desired and technology like 3D reconstruction based on macro photography has been used successfully for various purposes. We intend to push it further by using microscope rather than macro lens, which is supposed to be able to capture the microscopy level details of the subject. We design and implement a scanning method which is able to capture microscopy image set from a curve surface based on robotic arm, and the 3D reconstruction method suitable for the microscopy image set

    Temporal Mapping of Surveillance Video for Indexing and Summarization

    Get PDF
    This work converts the surveillance video to a temporal domain image called temporal profile that is scrollable and scalable for quick searching of long surveillance video by human operators. Such a profile is sampled with linear pixel lines located at critical locations in the video frames. It has precise time stamp on the target passing events through those locations in the field of view, shows target shapes for identification, and facilitates the target search in long videos. In this paper, we first study the projection and shape properties of dynamic scenes in the temporal profile so as to set sampling lines. Then, we design methods to capture target motion and preserve target shapes for target recognition in the temporal profile. It also provides the uniformed resolution of large crowds passing through so that it is powerful in target counting and flow measuring. We also align multiple sampling lines to visualize the spatial information missed in a single line temporal profile. Finally, we achieve real time adaptive background removal and robust target extraction to ensure long-term surveillance. Compared to the original video or the shortened video, this temporal profile reduced data by one dimension while keeping the majority of information for further video investigation. As an intermediate indexing image, the profile image can be transmitted via network much faster than video for online video searching task by multiple operators. Because the temporal profile can abstract passing targets with efficient computation, an even more compact digest of the surveillance video can be created

    What does the honeybee see? And how do we know?

    Get PDF
    This book is the only account of what the bee, as an example of an insect, actually detects with its eyes. Bees detect some visual features such as edges and colours, but there is no sign that they reconstruct patterns or put together features to form objects. Bees detect motion but have no perception of what it is that moves, and certainly they do not recognize “things” by their shapes. Yet they clearly see well enough to fly and find food with a minute brain. Bee vision is therefore relevant to the construction of simple artificial visual systems, for example for mobile robots. The surprising conclusion is that bee vision is adapted to the recognition of places, not things. In this volume, Adrian Horridge also sets out the curious and contentious history of how bee vision came to be understood, with an account of a century of neglect of old experimental results, errors of interpretation, sharp disagreements, and failures of the scientific method. The design of the experiments and the methods of making inferences from observations are also critically examined, with the conclusion that scientists are often hesitant, imperfect and misleading, ignore the work of others, and fail to consider alternative explanations. The erratic path to understanding makes interesting reading for anyone with an analytical mind who thinks about the methods of science or the engineering of seeing machines

    LOCATION DISTRIBUTION OPTIMIZATION OF PHOTOGRAPHING SITES FOR INDOOR PANORAMA MODELING

    Get PDF

    Follow the Sound : Design of mobile spatial audio applications for pedestrian navigation

    Get PDF
    Auditory displays are slower than graphical user interfaces. We believe spatial audio can change that. Human perception can localize the position of sound sources due to psychoacoustical cues. Spatial audio reproduces these cues to produce virtual sound source position by headphones. The spatial attribute of sound can be used to produce richer and more effective auditory displays. In this work, there is proposed a set of interaction design guidelines for the use of spatial audio displays in a mobile context. These guidelines are inferred from psychoacoustical theory, design theory and experience with prototype development. The horizontal front arc is presented as the optimum area for sound localization, and the use of head- or body-tracking is stated to be highly beneficial. Blind and visually impaired pedestrians may use auditory displays on mobile devices as navigation aids. Such aids have the potential to give visually impaired access to the environment and independence of movement. Custom made hardware is not always needed, as today’s smartphones offer a powerful platform for specialized applications. The Sound Guide prototype application was developed for the Apple iPhone and offered route guidance through the spatial position of audio icons. Real-time directional guidance was achieved through the use of GPS, compass sensor and gyroscope sensor. Spatial audio was accomplished through the use of prefiltered audio tracks that represented a 360° horizontal circle around the user. The source code of this prototype is made available to the community. Field tests of the prototype were done with three participants and one pilot tester that were visually impaired. One route was navigated with the help of the prototype. Interviews were done to get background information on navigation for visually impaired pedestrians. This was done to see how the prototype was received by visually impaired test users and what can be done to improve the concept in later development. Even though the prototype suffered from technical instabilities during the field tests, the general responses were positive. The blind participants saw potential in this technology and how it could be used in providing directional information. A range of improvements on the concept has been proposed

    Glance Vs. Gaze

    No full text
    This research investigates the phenomenology of vision in response to the following question: What is a way of looking through architecture that can cultivate a positive connection with the landscape? Two modes of vision the glance and the gaze are explored. This research argues that the glance allows one to see more of the landscape than the gaze. The predominance and negative implications of the gaze are highlighted and the position of the glance as an overlooked act of vision is established. This research proposes that the visual act of glancing, through strategically placed and sized window frames, is capable of creating an image that can connect the tourist with the landscape. The glance can then be used to promote landscape regeneration and tourist wellbeing. These ideas are tested in the design of a tourist retreat. The design of the tourist retreat provides the conditions necessary for seeing in particular ways. The visual performance of the tourist is carefully considered in the design. The tourist is treated as the subject and the landscape as the object. This research proposes the tourist’s relationship to landscape can be manipulated through a variety of frames. A comparison between horizontal and vertical frames is made that demonstrates the vertical frame can connect better with the landscape. The proportions of the frames are altered to suit the programme of the tourist retreat. In doing so the tourist retreat transforms the visual performance of the tourism, the tourist and the landscape

    Appearance Modelling and Reconstruction for Navigation in Minimally Invasive Surgery

    Get PDF
    Minimally invasive surgery is playing an increasingly important role for patient care. Whilst its direct patient benefit in terms of reduced trauma, improved recovery and shortened hospitalisation has been well established, there is a sustained need for improved training of the existing procedures and the development of new smart instruments to tackle the issue of visualisation, ergonomic control, haptic and tactile feedback. For endoscopic intervention, the small field of view in the presence of a complex anatomy can easily introduce disorientation to the operator as the tortuous access pathway is not always easy to predict and control with standard endoscopes. Effective training through simulation devices, based on either virtual reality or mixed-reality simulators, can help to improve the spatial awareness, consistency and safety of these procedures. This thesis examines the use of endoscopic videos for both simulation and navigation purposes. More specifically, it addresses the challenging problem of how to build high-fidelity subject-specific simulation environments for improved training and skills assessment. Issues related to mesh parameterisation and texture blending are investigated. With the maturity of computer vision in terms of both 3D shape reconstruction and localisation and mapping, vision-based techniques have enjoyed significant interest in recent years for surgical navigation. The thesis also tackles the problem of how to use vision-based techniques for providing a detailed 3D map and dynamically expanded field of view to improve spatial awareness and avoid operator disorientation. The key advantage of this approach is that it does not require additional hardware, and thus introduces minimal interference to the existing surgical workflow. The derived 3D map can be effectively integrated with pre-operative data, allowing both global and local 3D navigation by taking into account tissue structural and appearance changes. Both simulation and laboratory-based experiments are conducted throughout this research to assess the practical value of the method proposed
    corecore