10 research outputs found
Minimal Solvers for Monocular Rolling Shutter Compensation under Ackermann Motion
Modern automotive vehicles are often equipped with a budget commercial
rolling shutter camera. These devices often produce distorted images due to the
inter-row delay of the camera while capturing the image. Recent methods for
monocular rolling shutter motion compensation utilize blur kernel and the
straightness property of line segments. However, these methods are limited to
handling rotational motion and also are not fast enough to operate in real
time. In this paper, we propose a minimal solver for the rolling shutter motion
compensation which assumes known vertical direction of the camera. Thanks to
the Ackermann motion model of vehicles which consists of only two motion
parameters, and two parameters for the simplified depth assumption that lead to
a 4-line algorithm. The proposed minimal solver estimates the rolling shutter
camera motion efficiently and accurately. The extensive experiments on real and
simulated datasets demonstrate the benefits of our approach in terms of
qualitative and quantitative results.Comment: Submitted to WACV 201
Towards High-Frequency Tracking and Fast Edge-Aware Optimization
This dissertation advances the state of the art for AR/VR tracking systems by
increasing the tracking frequency by orders of magnitude and proposes an
efficient algorithm for the problem of edge-aware optimization.
AR/VR is a natural way of interacting with computers, where the physical and
digital worlds coexist. We are on the cusp of a radical change in how humans
perform and interact with computing. Humans are sensitive to small
misalignments between the real and the virtual world, and tracking at
kilo-Hertz frequencies becomes essential. Current vision-based systems fall
short, as their tracking frequency is implicitly limited by the frame-rate of
the camera. This thesis presents a prototype system which can track at orders
of magnitude higher than the state-of-the-art methods using multiple commodity
cameras. The proposed system exploits characteristics of the camera
traditionally considered as flaws, namely rolling shutter and radial
distortion. The experimental evaluation shows the effectiveness of the method
for various degrees of motion.
Furthermore, edge-aware optimization is an indispensable tool in the computer
vision arsenal for accurate filtering of depth-data and image-based rendering,
which is increasingly being used for content creation and geometry processing
for AR/VR. As applications increasingly demand higher resolution and speed,
there exists a need to develop methods that scale accordingly. This
dissertation proposes such an edge-aware optimization framework which is
efficient, accurate, and algorithmically scales well, all of which are much
desirable traits not found jointly in the state of the art. The experiments
show the effectiveness of the framework in a multitude of computer vision tasks
such as computational photography and stereo.Comment: PhD thesi
Trajectory Optimization of a Mobile Camera System for Maximizing Optical Character Recognition
Camera systems in motion are subject to significant blurring effects that lead to a loss of information during the image capture. This is especially damaging for optical character recognition for which edge preservation is critical to achieving a high recognition rate. Using non-blind motion deblurring, a trajectory and point spread function can be designed to maximize the recognition rate while meeting endpoint constraints. Optimization through the use of radial basis function networks can therefore be used as a way to find ideal trajectories to reduce blurring effects and preserve text sharpness. This work investigates this problem using simulation of a blurred image capture process. The simulation is automated using radial basis function network optimization and a genetic algorithm to determine trajectories with the best recognition rate. Optimized trajectories yielded recognition scores with up to 57.3% improvement in simulation compared to an analogous linear profile. These results were then verified through physical experimentation with a real-world, controlled-blur image capture process that yielded up to 29.4% improvement across the same comparison. Results were then analyzed using spectral analysis to understand why the chosen trajectories preserve text edges. These findings can be applied to a wide variety of controlled mobile camera platforms, such as autonomous automobiles or unmanned aerial vehicles, to improve their ability to gather information from their environment.M.S
Evolution of A Common Vector Space Approach to Multi-Modal Problems
A set of methods to address computer vision problems has been developed. Video un- derstanding is an activate area of research in recent years. If one can accurately identify salient objects in a video sequence, these components can be used in information retrieval and scene analysis. This research started with the development of a course-to-fine frame- work to extract salient objects in video sequences. Previous work on image and video frame background modeling involved methods that ranged from simple and efficient to accurate but computationally complex. It will be shown in this research that the novel approach to implement object extraction is efficient and effective that outperforms the existing state-of-the-art methods. However, the drawback to this method is the inability to deal with non-rigid motion.
With the rapid development of artificial neural networks, deep learning approaches are explored as a solution to computer vision problems in general. Focusing on image and text, the image (or video frame) understanding can be achieved using CVS. With this concept, modality generation and other relevant applications such as automatic im- age description, text paraphrasing, can be explored. Specifically, video sequences can be modeled by Recurrent Neural Networks (RNN), the greater depth of the RNN leads to smaller error, but that makes the gradient in the network unstable during training.To overcome this problem, a Batch-Normalized Recurrent Highway Network (BNRHN) was developed and tested on the image captioning (image-to-text) task. In BNRHN, the highway layers are incorporated with batch normalization which diminish the gradient vanishing and exploding problem. In addition, a sentence to vector encoding framework that is suitable for advanced natural language processing is developed. This semantic text embedding makes use of the encoder-decoder model which is trained on sentence paraphrase pairs (text-to-text). With this scheme, the latent representation of the text is shown to encode sentences with common semantic information with similar vector rep- resentations. In addition to image-to-text and text-to-text, an image generation model is developed to generate image from text (text-to-image) or another image (image-to- image) based on the semantics of the content. The developed model, which refers to the Multi-Modal Vector Representation (MMVR), builds and encodes different modalities into a common vector space that achieve the goal of keeping semantics and conversion between text and image bidirectional. The concept of CVS is introduced in this research to deal with multi-modal conversion problems. In theory, this method works not only on text and image, but also can be generalized to other modalities, such as video and audio. The characteristics and performance are supported by both theoretical analysis and experimental results. Interestingly, the MMVR model is one of the many possible ways to build CVS. In the final stages of this research, a simple and straightforward framework to build CVS, which is considered as an alternative to the MMVR model, is presented
Model-based Optical Flow: Layers, Learning, and Geometry
The estimation of motion in video sequences establishes temporal correspondences between pixels and surfaces and allows reasoning about a scene using multiple frames. Despite being a focus of research for over three decades, computing motion, or optical flow, remains challenging due to a number of difficulties, including the treatment of motion discontinuities and occluded regions, and the integration of information from more than two frames. One reason for these issues is that most optical flow algorithms only reason about the motion of pixels on the image plane, while not taking the image formation pipeline or the 3D structure of the world into account. One approach to address this uses layered models, which represent the occlusion structure of a scene and provide an approximation to the geometry. The goal of this dissertation is to show ways to inject additional knowledge about the scene into layered methods, making them more robust, faster, and more accurate.
First, this thesis demonstrates the modeling power of layers using the example of motion blur in videos, which is caused by fast motion relative to the exposure time of the camera. Layers segment the scene into regions that move coherently while preserving their occlusion relationships. The motion of each layer therefore directly determines its motion blur. At the same time, the layered model captures complex blur overlap effects at motion discontinuities. Using layers, we can thus formulate a generative model for blurred video sequences, and use this model to simultaneously deblur a video and compute accurate optical flow for highly dynamic scenes containing motion blur.
Next, we consider the representation of the motion within layers. Since, in a layered model, important motion discontinuities are captured by the segmentation into layers, the flow within each layer varies smoothly and can be approximated using a low dimensional subspace. We show how this subspace can be learned from training data using principal component analysis (PCA), and that flow estimation using this subspace is computationally efficient. The combination of the layered model and the low-dimensional subspace gives the best of both worlds, sharp motion discontinuities from the layers and computational efficiency from the subspace.
Lastly, we show how layered methods can be dramatically improved using simple semantics. Instead of treating all layers equally, a semantic segmentation divides the scene into its static parts and moving objects. Static parts of the scene constitute a large majority of what is shown in typical video sequences; yet, in such regions optical flow is fully constrained by the depth structure of the scene and the camera motion. After segmenting out moving objects, we consider only static regions, and explicitly reason about the structure of the scene and the camera motion, yielding much better optical flow estimates. Furthermore, computing the structure of the scene allows to better combine information from multiple frames, resulting in high accuracies even in occluded regions. For moving regions, we compute the flow using a generic optical flow method, and combine it with the flow computed for the static regions to obtain a full optical flow field.
By combining layered models of the scene with reasoning about the dynamic behavior of the real, three-dimensional world, the methods presented herein push the envelope of optical flow computation in terms of robustness, speed, and accuracy, giving state-of-the-art results on benchmarks and pointing to important future research directions for the estimation of motion in natural scenes