10 research outputs found

    Minimal Solvers for Monocular Rolling Shutter Compensation under Ackermann Motion

    Full text link
    Modern automotive vehicles are often equipped with a budget commercial rolling shutter camera. These devices often produce distorted images due to the inter-row delay of the camera while capturing the image. Recent methods for monocular rolling shutter motion compensation utilize blur kernel and the straightness property of line segments. However, these methods are limited to handling rotational motion and also are not fast enough to operate in real time. In this paper, we propose a minimal solver for the rolling shutter motion compensation which assumes known vertical direction of the camera. Thanks to the Ackermann motion model of vehicles which consists of only two motion parameters, and two parameters for the simplified depth assumption that lead to a 4-line algorithm. The proposed minimal solver estimates the rolling shutter camera motion efficiently and accurately. The extensive experiments on real and simulated datasets demonstrate the benefits of our approach in terms of qualitative and quantitative results.Comment: Submitted to WACV 201

    A Back-illuminated Voltage-domain Global Shutter Pixel with Dual In-pixel Storage

    Get PDF

    Towards High-Frequency Tracking and Fast Edge-Aware Optimization

    Full text link
    This dissertation advances the state of the art for AR/VR tracking systems by increasing the tracking frequency by orders of magnitude and proposes an efficient algorithm for the problem of edge-aware optimization. AR/VR is a natural way of interacting with computers, where the physical and digital worlds coexist. We are on the cusp of a radical change in how humans perform and interact with computing. Humans are sensitive to small misalignments between the real and the virtual world, and tracking at kilo-Hertz frequencies becomes essential. Current vision-based systems fall short, as their tracking frequency is implicitly limited by the frame-rate of the camera. This thesis presents a prototype system which can track at orders of magnitude higher than the state-of-the-art methods using multiple commodity cameras. The proposed system exploits characteristics of the camera traditionally considered as flaws, namely rolling shutter and radial distortion. The experimental evaluation shows the effectiveness of the method for various degrees of motion. Furthermore, edge-aware optimization is an indispensable tool in the computer vision arsenal for accurate filtering of depth-data and image-based rendering, which is increasingly being used for content creation and geometry processing for AR/VR. As applications increasingly demand higher resolution and speed, there exists a need to develop methods that scale accordingly. This dissertation proposes such an edge-aware optimization framework which is efficient, accurate, and algorithmically scales well, all of which are much desirable traits not found jointly in the state of the art. The experiments show the effectiveness of the framework in a multitude of computer vision tasks such as computational photography and stereo.Comment: PhD thesi

    Trajectory Optimization of a Mobile Camera System for Maximizing Optical Character Recognition

    Get PDF
    Camera systems in motion are subject to significant blurring effects that lead to a loss of information during the image capture. This is especially damaging for optical character recognition for which edge preservation is critical to achieving a high recognition rate. Using non-blind motion deblurring, a trajectory and point spread function can be designed to maximize the recognition rate while meeting endpoint constraints. Optimization through the use of radial basis function networks can therefore be used as a way to find ideal trajectories to reduce blurring effects and preserve text sharpness. This work investigates this problem using simulation of a blurred image capture process. The simulation is automated using radial basis function network optimization and a genetic algorithm to determine trajectories with the best recognition rate. Optimized trajectories yielded recognition scores with up to 57.3% improvement in simulation compared to an analogous linear profile. These results were then verified through physical experimentation with a real-world, controlled-blur image capture process that yielded up to 29.4% improvement across the same comparison. Results were then analyzed using spectral analysis to understand why the chosen trajectories preserve text edges. These findings can be applied to a wide variety of controlled mobile camera platforms, such as autonomous automobiles or unmanned aerial vehicles, to improve their ability to gather information from their environment.M.S

    Evolution of A Common Vector Space Approach to Multi-Modal Problems

    Get PDF
    A set of methods to address computer vision problems has been developed. Video un- derstanding is an activate area of research in recent years. If one can accurately identify salient objects in a video sequence, these components can be used in information retrieval and scene analysis. This research started with the development of a course-to-fine frame- work to extract salient objects in video sequences. Previous work on image and video frame background modeling involved methods that ranged from simple and efficient to accurate but computationally complex. It will be shown in this research that the novel approach to implement object extraction is efficient and effective that outperforms the existing state-of-the-art methods. However, the drawback to this method is the inability to deal with non-rigid motion. With the rapid development of artificial neural networks, deep learning approaches are explored as a solution to computer vision problems in general. Focusing on image and text, the image (or video frame) understanding can be achieved using CVS. With this concept, modality generation and other relevant applications such as automatic im- age description, text paraphrasing, can be explored. Specifically, video sequences can be modeled by Recurrent Neural Networks (RNN), the greater depth of the RNN leads to smaller error, but that makes the gradient in the network unstable during training.To overcome this problem, a Batch-Normalized Recurrent Highway Network (BNRHN) was developed and tested on the image captioning (image-to-text) task. In BNRHN, the highway layers are incorporated with batch normalization which diminish the gradient vanishing and exploding problem. In addition, a sentence to vector encoding framework that is suitable for advanced natural language processing is developed. This semantic text embedding makes use of the encoder-decoder model which is trained on sentence paraphrase pairs (text-to-text). With this scheme, the latent representation of the text is shown to encode sentences with common semantic information with similar vector rep- resentations. In addition to image-to-text and text-to-text, an image generation model is developed to generate image from text (text-to-image) or another image (image-to- image) based on the semantics of the content. The developed model, which refers to the Multi-Modal Vector Representation (MMVR), builds and encodes different modalities into a common vector space that achieve the goal of keeping semantics and conversion between text and image bidirectional. The concept of CVS is introduced in this research to deal with multi-modal conversion problems. In theory, this method works not only on text and image, but also can be generalized to other modalities, such as video and audio. The characteristics and performance are supported by both theoretical analysis and experimental results. Interestingly, the MMVR model is one of the many possible ways to build CVS. In the final stages of this research, a simple and straightforward framework to build CVS, which is considered as an alternative to the MMVR model, is presented

    Model-based Optical Flow: Layers, Learning, and Geometry

    Get PDF
    The estimation of motion in video sequences establishes temporal correspondences between pixels and surfaces and allows reasoning about a scene using multiple frames. Despite being a focus of research for over three decades, computing motion, or optical flow, remains challenging due to a number of difficulties, including the treatment of motion discontinuities and occluded regions, and the integration of information from more than two frames. One reason for these issues is that most optical flow algorithms only reason about the motion of pixels on the image plane, while not taking the image formation pipeline or the 3D structure of the world into account. One approach to address this uses layered models, which represent the occlusion structure of a scene and provide an approximation to the geometry. The goal of this dissertation is to show ways to inject additional knowledge about the scene into layered methods, making them more robust, faster, and more accurate. First, this thesis demonstrates the modeling power of layers using the example of motion blur in videos, which is caused by fast motion relative to the exposure time of the camera. Layers segment the scene into regions that move coherently while preserving their occlusion relationships. The motion of each layer therefore directly determines its motion blur. At the same time, the layered model captures complex blur overlap effects at motion discontinuities. Using layers, we can thus formulate a generative model for blurred video sequences, and use this model to simultaneously deblur a video and compute accurate optical flow for highly dynamic scenes containing motion blur. Next, we consider the representation of the motion within layers. Since, in a layered model, important motion discontinuities are captured by the segmentation into layers, the flow within each layer varies smoothly and can be approximated using a low dimensional subspace. We show how this subspace can be learned from training data using principal component analysis (PCA), and that flow estimation using this subspace is computationally efficient. The combination of the layered model and the low-dimensional subspace gives the best of both worlds, sharp motion discontinuities from the layers and computational efficiency from the subspace. Lastly, we show how layered methods can be dramatically improved using simple semantics. Instead of treating all layers equally, a semantic segmentation divides the scene into its static parts and moving objects. Static parts of the scene constitute a large majority of what is shown in typical video sequences; yet, in such regions optical flow is fully constrained by the depth structure of the scene and the camera motion. After segmenting out moving objects, we consider only static regions, and explicitly reason about the structure of the scene and the camera motion, yielding much better optical flow estimates. Furthermore, computing the structure of the scene allows to better combine information from multiple frames, resulting in high accuracies even in occluded regions. For moving regions, we compute the flow using a generic optical flow method, and combine it with the flow computed for the static regions to obtain a full optical flow field. By combining layered models of the scene with reasoning about the dynamic behavior of the real, three-dimensional world, the methods presented herein push the envelope of optical flow computation in terms of robustness, speed, and accuracy, giving state-of-the-art results on benchmarks and pointing to important future research directions for the estimation of motion in natural scenes

    Rolling shutter motion deblurring

    No full text
    corecore