89,951 research outputs found
AUTOMATIC IMAGE TO MODEL ALIGNMENT FOR PHOTO-REALISTIC URBAN MODEL RECONSTRUCTION
We introduce a hybrid approach in which images of an urban scene are automatically alignedwith a base geometry of the scene to determine model-relative external camera parameters. Thealgorithm takes as input a model of the scene and images with approximate external cameraparameters and aligns the images to the model by extracting the facades from the images andaligning the facades with the model by minimizing over a multivariate objective function. Theresulting image-pose pairs can be used to render photo-realistic views of the model via texturemapping.Several natural extensions to the base hybrid reconstruction technique are also introduced. Theseextensions, which include vanishing point based calibration refinement and video stream basedreconstruction, increase the accuracy of the base algorithm, reduce the amount of data that mustbe provided by the user as input to the algorithm, and provide a mechanism for automaticallycalibrating a large set of images for post processing steps such as automatic model enhancementand fly-through model visualization.Traditionally, photo-realistic urban reconstruction has been approached from purely image-basedor model-based approaches. Recently, research has been conducted on hybrid approaches, whichcombine the use of images and models. Such approaches typically require user assistance forcamera calibration. Our approach is an improvement over these methods because it does notrequire user assistance for camera calibration
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Can conversational videos captured from multiple egocentric viewpoints reveal
the map of a scene in a cost-efficient way? We seek to answer this question by
proposing a new problem: efficiently building the map of a previously unseen 3D
environment by exploiting shared information in the egocentric audio-visual
observations of participants in a natural conversation. Our hypothesis is that
as multiple people ("egos") move in a scene and talk among themselves, they
receive rich audio-visual cues that can help uncover the unseen areas of the
scene. Given the high cost of continuously processing egocentric visual
streams, we further explore how to actively coordinate the sampling of visual
information, so as to minimize redundancy and reduce power use. To that end, we
present an audio-visual deep reinforcement learning approach that works with
our shared scene mapper to selectively turn on the camera to efficiently chart
out the space. We evaluate the approach using a state-of-the-art audio-visual
simulator for 3D scenes as well as real-world video. Our model outperforms
previous state-of-the-art mapping methods, and achieves an excellent
cost-accuracy tradeoff. Project: http://vision.cs.utexas.edu/projects/chat2map.Comment: Accepted to CVPR 202
Mouse retinal specializations reflect knowledge of natural environment statistics
Pressures for survival drive sensory circuit adaption to a species’ habitat, making it essential to statistically characterise natural scenes. Mice, a prominent visual system model, are dichromatic with enhanced sensitivity to green and UV. Their visual environment, however, is rarely considered. Here, we built a UV-green camera to record footage from mouse habitats. We found chromatic contrast to greatly diverge in the upper but not the lower visual field, an environmental difference that may underlie the species’ superior colour discrimination in the upper visual field. Moreover, training an autoencoder on upper but not lower visual field scenes was sufficient for the emergence of colour-opponent filters. Furthermore, the upper visual field was biased towards dark UV contrasts, paralleled by more light-offset-sensitive cells in the ventral retina. Finally, footage recorded at twilight suggests that UV promotes aerial predator detection. Our findings support that natural scene statistics shaped early visual processing in evolution
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Autonomous monitoring of cliff nesting seabirds using computer vision
In this paper we describe a proposed system for automatic visual monitoring of seabird populations. Image sequences of cliff face nesting sites are captured using time-lapse digital photography. We are developing image processing software which is designed to automatically interpret these images, determine the number of birds present, and monitor activity. We focus primarily on the the development of low-level image processing techniques to support this goal. We first describe our existing work in video processing, and show how it is suitable for this problem domain. Image samples from a particular nest site are presented, and used to describe the associated challenges. We conclude by showing how we intend to develop our work to construct a distributed system capable of simultaneously monitoring a number of sites in the same locality
Texture-Detail Preservation Measurement in Camera Phones: An Updated Approach
Recent advances in mobile phone cameras have poised them to take over compact
hand-held cameras as the consumer’s preferred camera option. Along with
advances in the number of pixels, motion blur removal, face-tracking, and noise reduction
algorithms have significant roles in the internal processing of the devices.
An undesired effect of severe noise reduction is the loss of texture (i.e. low-contrast
fine details) of the original scene. Current established methods for resolution measurement
fail to accurately portray the texture loss incurred in a camera system.
The development of an accurate objective method to identify the texture preservation
or texture reproduction capability of a camera device is important in this
regard.
The ‘Dead Leaves’ target has been used extensively as a method to measure
the modulation transfer function (MTF) of cameras that employ highly non-linear
noise-reduction methods. This stochastic model consists of a series of overlapping
circles with radii r distributed as r−3, and having uniformly distributed gray level,
which gives an accurate model of occlusion in a natural setting and hence mimics
a natural scene. This target can be used to model the texture transfer through a
camera system when a natural scene is captured.
In the first part of our study we identify various factors that affect the MTF
measured using the ‘Dead Leaves’ chart. These include variations in illumination,
distance, exposure time and ISO sensitivity among others. We discuss the main
differences of this method with the existing resolution measurement techniques and
identify the advantages.
In the second part of this study, we propose an improvement to the current texture
MTF measurement algorithm. High frequency residual noise in the processed
image contains the same frequency content as fine texture detail, and is sometimes
reported as such, thereby leading to inaccurate results. A wavelet thresholding based
denoising technique is utilized for modeling the noise present in the final
captured image. This updated noise model is then used for calculating an accurate
texture MTF. We present comparative results for both algorithms under various
image capture conditions
- …