252 research outputs found

    Simultaneous human segmentation, depth and pose estimation via dual decomposition

    Get PDF
    The tasks of stereo matching, segmentation, and human pose estimation have been popular in computer vision in recent years, but attempts to combine the three tasks have so far resulted in compromises: either using infra-red cameras, or a greatly simplified body model. We propose a framework for estimating a detailed human skeleton in 3D from a stereo pair of images. Within this framework, we define an energy function that incorporates the relationship between the segmentation results, the pose estimation results, and the disparity space image. Specifically, we codify the assertions that foreground pixels should relate to some body part, should correspond to a continuous surface in the disparityspace image, and should be closer to the camera than the surrounding background pixels. Our energy function is NP-hard, however we show how to efficiently optimize a relaxation of it using dual decomposition. We show that applying this approach leads to improved results in all three tasks, and also introduce an extensive and challenging new dataset, which we use as a benchmark for evaluating 3D human pose estimation

    Towards Optimal Image Stitching for Virtual Microscopy

    Get PDF
    In this paper we present an image stitching method based on dynamic programming and describe its application to automated slide acquisition for Virtual Microscopy (VM). Given a large number of fields of view (FOVs) acquired from a single microscope slide, we composite these images into a single large 'virtual slide' image. The location of each FOV is determined using a new algorithm based on dynamic programming. We compare the performance of the proposed algorithm to an existing greedy algorithm. In a visual trial it is shown that the new algorithm provides a significant improvement in perceived image quality at image boundaries compared to the existing algorithm

    Total variation on a tree

    Full text link
    We consider the problem of minimizing the continuous valued total variation subject to different unary terms on trees and propose fast direct algorithms based on dynamic programming to solve these problems. We treat both the convex and the non-convex case and derive worst case complexities that are equal or better than existing methods. We show applications to total variation based 2D image processing and computer vision problems based on a Lagrangian decomposition approach. The resulting algorithms are very efficient, offer a high degree of parallelism and come along with memory requirements which are only in the order of the number of image pixels.Comment: accepted to SIAM Journal on Imaging Sciences (SIIMS

    ROAM: a Rich Object Appearance Model with Application to Rotoscoping

    Get PDF
    Rotoscoping, the detailed delineation of scene elements through a video shot, is a painstaking task of tremendous importance in professional post-production pipelines. While pixel-wise segmentation techniques can help for this task, professional rotoscoping tools rely on parametric curves that offer the artists a much better interactive control on the definition, editing and manipulation of the segments of interest. Sticking to this prevalent rotoscoping paradigm, we propose a novel framework to capture and track the visual aspect of an arbitrary object in a scene, given a first closed outline of this object. This model combines a collection of local foreground/background appearance models spread along the outline, a global appearance model of the enclosed object and a set of distinctive foreground landmarks. The structure of this rich appearance model allows simple initialization, efficient iterative optimization with exact minimization at each step, and on-line adaptation in videos. We demonstrate qualitatively and quantitatively the merit of this framework through comparisons with tools based on either dynamic segmentation with a closed curve or pixel-wise binary labelling

    Learning Random Field Models For Computer Vision

    Full text link
    Random fields are among the most popular models in computer vision due to their ability to model statistical interdependence between individual variables. Three key issues in the application of random fields to a given problem are (i) defining appropriate graph structures that represent the underlying task, (ii) finding suitable functions over the graph that encode certain preferences, and (iii) performing inference efficiently on the resulting model to obtain a solution. While a large body of recent research has been devoted to the last issue, this thesis will focus on the first two. We first study them in the context of three well-known low-level vision problems, namely image denoising, stereo vision, and optical flow, and demonstrate the benefit of using more appropriate graph structures and learning more suitable potential functions. Moreover we extend our study to landmark classification, a problem in the high-level vision domain where random field models have rarely been used. We show that higher classification accuracy can be achieved by considering multiple images jointly as a random field instead of regarding them as separate entities

    Proceedings of the 2009 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    The joint workshop of the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, Karlsruhe, and the Vision and Fusion Laboratory (Institute for Anthropomatics, Karlsruhe Institute of Technology (KIT)), is organized annually since 2005 with the aim to report on the latest research and development findings of the doctoral students of both institutions. This book provides a collection of 16 technical reports on the research results presented on the 2009 workshop

    High performance and error resilient probabilistic inference system for machine learning

    Get PDF
    Many real-world machine learning applications can be considered as inferring the best label assignment of maximum a posteriori probability (MAP) problems. Since these MAP problems are NP-hard in general, they are often dealt with using approximate inference algorithms on Markov random field (MRF) such as belief propagation (BP). However, this approximate inference is still computationally demanding, and thus custom hardware accelerators have been attractive for high performance and energy efficiency. There are various custom hardware implementations that employ BP to achieve reasonable performance for the real-world applications such as stereo matching. Due to lack of convergence guarantees, however, BP often fails to provide the right answer, thus degrading performance of the hardware. Therefore, we consider sequential tree-reweighted message passing (TRW-S), which avoids many of these convergence problems with BP via sequential execution of its computations but challenges parallel implementation for high throughput. In this work, therefore, we propose a novel streaming hardware architecture that parallelizes the sequential computations of TRW-S. Experimental results on stereo matching benchmarks show promising performance of our hardware implementation compared to the software implementation as well as other BP-based custom hardware or GPU implementations. From this result, we further demonstrate video-rate speed and high quality stereo matching using a hybrid CPU+FPGA platform. We propose three frame-level optimization techniques to fully exploit computational resources of a hybrid CPU+FPGA platform and achieve significant speed-up. We first propose a message reuse scheme which is guided by simple scene change detection. This scheme allows a current inference to be made based on a determination of whether the current result is expected to be similar to the inference result of the previous frame. We also consider frame level parallelization to process multiple frames in parallel using multiple FPGAs available in the platform. This parallelized hardware procedure is further pipelined with data management in CPU to overlap the execution time of the two and thereby reduce the entire processing time of the stereo video sequence. From experimental results with the real-world stereo video sequences, we see video-rate speed of our stereo matching system for QVGA stereo videos. Next, we consider error resilience of the message passing hardware for energy efficient hardware implementation. Modern nanoscale CMOS process technologies suffer in reliability caused by process, temperature and voltage variations. Conventional approaches to deal with such unreliability (e.g., design for the worst-case scenario) are complex and inefficient in terms of hardware resources and energy consumption. As machine learning applications are inherently probabilistic and robust to errors, statistical error compensation (SEC) techniques can play a significant role in achieving robust and energy-efficient implementation. SEC embraces the statistical nature of errors and utilizes statistical and probabilistic techniques to build robust systems. Energy-efficiency is obtained by trading off the enhanced robustness with energy. In this work, we analyze the error resilience of our message passing inference hardware subject to the hardware errors (e.g. errors caused by timing violation in circuits) and explore application of a popular SEC technique, algorithmic noise tolerance (ANT), to this hardware. Analysis and simulations show that the TRW-S message passing hardware is tolerant to small magnitude arithmetic errors, but large magnitude errors cause significantly inaccurate inference results which need to be corrected using SEC. Experimental results show that the proposed ANT-based hardware can tolerate an error rate of 21.3%, with performance degradation of only 3.5 % with an energy savings of 39.7 %, compared to an error-free hardware. Lastly, we extend our TRW-S hardware toward a general purpose machine learning framework. We propose advanced streaming architecture with flexible choice of MRF setting to achieve 10-40x speedup across a variety of computer vision applications. Furthermore, we provide better theoretical understanding of error resiliency of TRW-S, and of the implication of ANT for TRW-S, under more general MRF setting, along with strong empirical support
    corecore