390 research outputs found
Video anatomy : spatial-temporal video profile
Indiana University-Purdue University Indianapolis (IUPUI)A massive amount of videos are uploaded on video websites, smooth video browsing, editing, retrieval, and summarization are demanded. Most of the videos employ several types of camera operations for expanding field of view, emphasizing events, and expressing cinematic effect. To digest heterogeneous videos in video websites and databases, video clips are profiled to 2D image scroll containing both spatial and temporal information for video preview. The video profile is visually continuous, compact, scalable, and indexing to each frame. This work analyzes the camera kinematics including zoom, translation, and rotation, and categorize camera actions as their combinations. An automatic video summarization framework is proposed and developed. After conventional video clip segmentation and video segmentation for smooth camera operations, the global flow field under all camera actions has been investigated for profiling various types of video. A new algorithm has been designed to extract the major flow direction and convergence factor using condensed images. Then this work proposes a uniform scheme to segment video clips and sections, sample video volume across the major flow, compute flow convergence factor, in order to obtain an intrinsic scene space less influenced by the camera ego-motion. The motion blur technique has also been used to render dynamic targets in the profile. The resulting profile of video can be displayed in a video track to guide the access to video frames, help video editing, and facilitate the applications such as surveillance, visual archiving of environment, video retrieval, and online video preview
Multiperspective mosaics and layered representation for scene visualization
This thesis documents the efforts made to implement multiperspective mosaicking for the purpose of mosaicking undervehicle and roadside sequences. For the undervehicle sequences, it is desired to create a large, high-resolution mosaic that may used to quickly inspect the entire scene shot by a camera making a single pass underneath the vehicle. Several constraints are placed on the video data, in order to facilitate the assumption that the entire scene in the sequence exists on a single plane. Therefore, a single mosaic is used to represent a single video sequence. Phase correlation is used to perform motion analysis in this case. For roadside video sequences, it is assumed that the scene is composed of several planar layers, as opposed to a single plane. Layer extraction techniques are implemented in order to perform this decomposition. Instead of using phase correlation to perform motion analysis, the Lucas-Kanade motion tracking algorithm is used in order to create dense motion maps. Using these motion maps, spatial support for each layer is determined based on a pre-initialized layer model. By separating the pixels in the scene into motion-specific layers, it is possible to sample each element in the scene correctly while performing multiperspective mosaicking. It is also possible to fill in many gaps in the mosaics caused by occlusions, hence creating more complete representations of the objects of interest. The results are several mosaics with each mosaic representing a single planar layer of the scene
Widening the view angle of auto-multiscopic display, denoising low brightness light field data and 3D reconstruction with delicate details
This doctoral thesis will present the results of my work into widening the viewing angle
of the auto-multiscopic display, denoising light filed data the enhancement of captured
light filed data captured in low light circumstance, and the attempts on reconstructing
the subject surface with delicate details from microscopy image sets.
The automultiscopic displays carefully control the distribution of emitted light over
space, direction (angle) and time so that even a static image displayed can encode
parallax across viewing directions (light field). This allows simultaneous observation by
multiple viewers, each perceiving 3D from their own (correct) perspective. Currently,
the illusion can only be effectively maintained over a narrow range of viewing angles.
We propose and analyze a simple solution to widen the range of viewing angles for
automultiscopic displays that use parallax barriers. We insert a refractive medium, with
a high refractive index, between the display and parallax barriers. The inserted medium
warps the exitant lightfield in a way that increases the potential viewing angle. We
analyze the consequences of this warp and build a prototype with a 93% increase in
the effective viewing angle. Additionally, we developed an integral images synthesis
method that can address the refraction introduced by the inserted medium efficiently
without the use of ray tracing.
Capturing light field image with a short exposure time is preferable for eliminating
the motion blur but it also leads to low brightness in a low light environment, which
results in a low signal noise ratio. Most light field denoising methods apply regular 2D
image denoising method to the sub-aperture images of a 4D light field directly, but it
is not suitable for focused light field data whose sub-aperture image resolution is too
low to be applied regular denoising methods. Therefore, we propose a deep learning
denoising method based on micro lens images of focused light field to denoise the depth
map and the original micro lens image set simultaneously, and achieved high quality
total focused images from the low focused light field data.
In areas like digital museum, remote researching, 3D reconstruction with delicate
details of subjects is desired and technology like 3D reconstruction based on macro
photography has been used successfully for various purposes. We intend to push it
further by using microscope rather than macro lens, which is supposed to be able to
capture the microscopy level details of the subject. We design and implement a scanning
method which is able to capture microscopy image set from a curve surface based on
robotic arm, and the 3D reconstruction method suitable for the microscopy image set
Temporal Mapping of Surveillance Video for Indexing and Summarization
This work converts the surveillance video to a temporal domain image called temporal profile that is scrollable and scalable for quick searching of long surveillance video by human operators. Such a profile is sampled with linear pixel lines located at critical locations in the video frames. It has precise time stamp on the target passing events through those locations in the field of view, shows target shapes for identification, and facilitates the target search in long videos. In this paper, we first study the projection and shape properties of dynamic scenes in the temporal profile so as to set sampling lines. Then, we design methods to capture target motion and preserve target shapes for target recognition in the temporal profile. It also provides the uniformed resolution of large crowds passing through so that it is powerful in target counting and flow measuring. We also align multiple sampling lines to visualize the spatial information missed in a single line temporal profile. Finally, we achieve real time adaptive background removal and robust target extraction to ensure long-term surveillance. Compared to the original video or the shortened video, this temporal profile reduced data by one dimension while keeping the majority of information for further video investigation. As an intermediate indexing image, the profile image can be transmitted via network much faster than video for online video searching task by multiple operators. Because the temporal profile can abstract passing targets with efficient computation, an even more compact digest of the surveillance video can be created
What does the honeybee see? And how do we know?
This book is the only account of what the bee, as an example of an insect, actually detects with its eyes. Bees detect some visual features such as edges and colours, but there is no sign that they reconstruct patterns or put together features to form objects. Bees detect motion but have no perception of what it is that moves, and certainly they do not recognize “things” by their shapes. Yet they clearly see well enough to fly and find food with a minute brain. Bee vision is therefore relevant to the construction of simple artificial visual systems, for example for mobile robots. The surprising conclusion is that bee vision is adapted to the recognition of places, not things. In this volume, Adrian Horridge also sets out the curious and contentious history of how bee vision came to be understood, with an account of a century of neglect of old experimental results, errors of interpretation, sharp disagreements, and failures of the scientific method. The design of the experiments and the methods of making inferences from observations are also critically examined, with the conclusion that scientists are often hesitant, imperfect and misleading, ignore the work of others, and fail to consider alternative explanations. The erratic path to understanding makes interesting reading for anyone with an analytical mind who thinks about the methods of science or the engineering of seeing machines
Follow the Sound : Design of mobile spatial audio applications for pedestrian navigation
Auditory displays are slower than graphical user interfaces. We believe spatial audio can change that. Human perception can localize the position of sound sources due to psychoacoustical cues. Spatial audio reproduces these cues to produce virtual sound source position by headphones. The spatial attribute of sound can be used to produce richer and more effective auditory displays.
In this work, there is proposed a set of interaction design guidelines for the use of spatial audio displays in a mobile context. These guidelines are inferred from psychoacoustical theory, design theory and experience with prototype development. The horizontal front arc is presented as the optimum area for sound localization, and the use of head- or body-tracking is stated to be highly beneficial.
Blind and visually impaired pedestrians may use auditory displays on mobile devices as navigation aids. Such aids have the potential to give visually impaired access to the environment and independence of movement. Custom made hardware is not always needed, as today’s smartphones offer a powerful platform for specialized applications.
The Sound Guide prototype application was developed for the Apple iPhone and offered route guidance through the spatial position of audio icons. Real-time directional guidance was achieved through the use of GPS, compass sensor and gyroscope sensor. Spatial audio was accomplished through the use of prefiltered audio tracks that represented a 360° horizontal circle around the user. The source code of this prototype is made available to the community.
Field tests of the prototype were done with three participants and one pilot tester that were visually impaired. One route was navigated with the help of the prototype. Interviews were done to get background information on navigation for visually impaired pedestrians. This was done to see how the prototype was received by visually impaired test users and what can be done to improve the concept in later development.
Even though the prototype suffered from technical instabilities during the field tests, the general responses were positive. The blind participants saw potential in this technology and how it could be used in providing directional information. A range of improvements on the concept has been proposed
Glance Vs. Gaze
This research investigates the phenomenology of vision in response to the following
question: What is a way of looking through architecture that can cultivate a
positive connection with the landscape? Two modes of vision the glance and the
gaze are explored. This research argues that the glance allows one to see more of
the landscape than the gaze. The predominance and negative implications of the
gaze are highlighted and the position of the glance as an overlooked act of vision is
established.
This research proposes that the visual act of glancing, through strategically placed
and sized window frames, is capable of creating an image that can connect the
tourist with the landscape. The glance can then be used to promote landscape
regeneration and tourist wellbeing. These ideas are tested in the design of a tourist
retreat. The design of the tourist retreat provides the conditions necessary for seeing
in particular ways.
The visual performance of the tourist is carefully considered in the design. The
tourist is treated as the subject and the landscape as the object. This research proposes
the tourist’s relationship to landscape can be manipulated through a variety
of frames. A comparison between horizontal and vertical frames is made that demonstrates
the vertical frame can connect better with the landscape. The proportions
of the frames are altered to suit the programme of the tourist retreat. In doing so
the tourist retreat transforms the visual performance of the tourism, the tourist
and the landscape
Appearance Modelling and Reconstruction for Navigation in Minimally Invasive Surgery
Minimally invasive surgery is playing an increasingly important role for patient
care. Whilst its direct patient benefit in terms of reduced trauma,
improved recovery and shortened hospitalisation has been well established,
there is a sustained need for improved training of the existing procedures
and the development of new smart instruments to tackle the issue of visualisation,
ergonomic control, haptic and tactile feedback. For endoscopic
intervention, the small field of view in the presence of a complex anatomy
can easily introduce disorientation to the operator as the tortuous access
pathway is not always easy to predict and control with standard endoscopes.
Effective training through simulation devices, based on either virtual reality
or mixed-reality simulators, can help to improve the spatial awareness,
consistency and safety of these procedures.
This thesis examines the use of endoscopic videos for both simulation
and navigation purposes. More specifically, it addresses the challenging
problem of how to build high-fidelity subject-specific simulation environments
for improved training and skills assessment. Issues related to mesh
parameterisation and texture blending are investigated. With the maturity
of computer vision in terms of both 3D shape reconstruction and localisation
and mapping, vision-based techniques have enjoyed significant interest
in recent years for surgical navigation. The thesis also tackles the problem
of how to use vision-based techniques for providing a detailed 3D map and
dynamically expanded field of view to improve spatial awareness and avoid
operator disorientation. The key advantage of this approach is that it does
not require additional hardware, and thus introduces minimal interference
to the existing surgical workflow. The derived 3D map can be effectively
integrated with pre-operative data, allowing both global and local 3D navigation
by taking into account tissue structural and appearance changes.
Both simulation and laboratory-based experiments are conducted throughout
this research to assess the practical value of the method proposed
- …