527 research outputs found

    Foreground Segmentation of Live Videos Using Boundary Matting Technology

    Get PDF
    This paper proposes an interactive method to extract foreground objects from live videos using Boundary Matting Technology. An initial segmentation consists of the primary associated frame of a ?rst and last video sequence. Main objective is to segment the images of live videos in a continuous manner. Video frames are 1st divided into pixels in such a way that there is a need to use Competing Support Vector Machine (CSVM) algorithm for the classi?cation of foreground and background methods. Accordingly, the extraction of foreground and background image sequences is done without human intervention. Finally, the initial frames which are segmented can be improved to get an accurate object boundary. The object boundaries are then used for matting these videos. Here an effectual algorithm for segmentation and then matting them is done for live videos where dif?cult scenarios like fuzzy object boundaries have been established. In the paper we generate Support Vector Machine (CSVMs) and also algorithms where local color distribution for both foreground and background video frames are used

    Algorithmic issues in visual object recognition

    Get PDF
    This thesis is divided into two parts covering two aspects of research in the area of visual object recognition. Part I is about human detection in still images. Human detection is a challenging computer vision task due to the wide variability in human visual appearances and body poses. In this part, we present several enhancements to human detection algorithms. First, we present an extension to the integral images framework to allow for constant time computation of non-uniformly weighted summations over rectangular regions using a bundle of integral images. Such computational element is commonly used in constructing gradient-based feature descriptors, which are the most successful in shape-based human detection. Second, we introduce deformable features as an alternative to the conventional static features used in classifiers based on boosted ensembles. Deformable features can enhance the accuracy of human detection by adapting to pose changes that can be described as translations of body features. Third, we present a comprehensive evaluation framework for cascade-based human detectors. The presented framework facilitates comparison between cascade-based detection algorithms, provides a confidence measure for result, and deploys a practical evaluation scenario. Part II explores the possibilities of enhancing the speed of core algorithms used in visual object recognition using the computing capabilities of Graphics Processing Units (GPUs). First, we present an implementation of Graph Cut on GPUs, which achieves up to 4x speedup against compared to a CPU implementation. The Graph Cut algorithm has many applications related to visual object recognition such as segmentation and 3D point matching. Second, we present an efficient sparse approximation of kernel matrices for GPUs that can significantly speed up kernel based learning algorithms, which are widely used in object detection and recognition. We present an implementation of the Affinity Propagation clustering algorithm based on this representation, which is about 6 times faster than another GPU implementation based on a conventional sparse matrix representation

    Fast Multi-frame Stereo Scene Flow with Motion Segmentation

    Full text link
    We propose a new multi-frame method for efficiently computing scene flow (dense depth and optical flow) and camera ego-motion for a dynamic scene observed from a moving stereo camera rig. Our technique also segments out moving objects from the rigid scene. In our method, we first estimate the disparity map and the 6-DOF camera motion using stereo matching and visual odometry. We then identify regions inconsistent with the estimated camera motion and compute per-pixel optical flow only at these regions. This flow proposal is fused with the camera motion-based flow proposal using fusion moves to obtain the final optical flow and motion segmentation. This unified framework benefits all four tasks - stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency. Our method is currently ranked third on the KITTI 2015 scene flow benchmark. Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3 orders of magnitude faster than the top six methods. We also report a thorough evaluation on challenging Sintel sequences with fast camera and object motion, where our method consistently outperforms OSF [Menze and Geiger, 2015], which is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo Scene Flow Benchmark in November 201

    Optimized Block-Based Algorithms to Label Connected Components on GPUs

    Get PDF
    Connected Components Labeling (CCL) is a crucial step of several image processing and computer vision pipelines. Many efficient sequential strategies exist, among which one of the most effective is the use of a block-based mask to drastically cut the number of memory accesses. In the last decade, aided by the fast development of Graphics Processing Units (GPUs), a lot of data parallel CCL algorithms have been proposed along with sequential ones. Applications that entirely run in GPU can benefit from parallel implementations of CCL that allow to avoid expensive memory transfers between host and device. In this paper, two new eight-connectivity CCL algorithms are proposed, namely Block-based Union Find (BUF) and Block-based Komura Equivalence (BKE). These algorithms optimize existing GPU solutions introducing a block-based approach. Extensions for three-dimensional datasets are also discussed. In order to produce a fair comparison with previously proposed alternatives, YACCLAB, a public CCL benchmarking framework, has been extended and made suitable for evaluating also GPU algorithms. Moreover, three-dimensional datasets have been added to its collection. Experimental results on real cases and synthetically generated datasets demonstrate the superiority of the new proposals with respect to state-of-the-art, both on 2D and 3D scenarios

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    Hardware dedicado para sistemas empotrados de visión

    Get PDF
    La constante evolución de las Tecnologías de la Información y las Comunicaciones no solo ha permitido que más de la mitad de la población mundial esté actualmente interconectada a través de Internet, sino que ha sido el caldo de cultivo en el que han surgido nuevos paradigmas, como el ‘Internet de las cosas’ (IoT) o la ‘Inteligencia ambiental’ (AmI), que plantean la necesidad de interconectar objetos con distintas funcionalidades para lograr un entorno digital, sensible y adaptativo, que proporcione servicios de muy distinta índole a sus usuarios. La consecución de este entorno requiere el desarrollo de dispositivos electrónicos de bajo coste que, con tamaño y peso reducido, sean capaces de interactuar con el medio que los rodea, operar con máxima autonomía y proporcionar un elevado nivel de inteligencia. La funcionalidad de muchos de estos dispositivos incluirá la capacidad para adquirir, procesar y transmitir imágenes, extrayendo, interpretando o modificando la información visual que resulte de interés para una determinada aplicación. En el marco de este desafío surge la presente Tesis Doctoral, cuyo eje central es el desarrollo de hardware dedicado para la implementación de algoritmos de procesamiento de imágenes y secuencias de vídeo usados en sistemas empotrados de visión. El trabajo persigue una doble finalidad. Por una parte, la búsqueda de soluciones que, por sus prestaciones y rendimiento, puedan ser incorporadas en sistemas que satisfagan las estrictas exigencias de funcionalidad, tamaño, consumo de energía y velocidad de operación demandadas por las nuevas aplicaciones. Por otra, el diseño de una serie de bloques funcionales implementados como módulos de propiedad intelectual, que permitan aliviar la carga computacional de las unidades de procesado de los sistemas en los que se integren. En la Tesis se proponen soluciones específicas para la implementación de dos tipos de operaciones habitualmente presentes en muchos sistemas de visión artificial: la sustracción de fondo y el etiquetado de componentes conexos. Las distintas alternativas surgen como consecuencia de aplicar una adecuada relación de compromiso entre funcionalidad y coste, entendiendo este último criterio en términos de recursos de cómputo, velocidad de operación y potencia consumida, lo que permite cubrir un amplio espectro de aplicaciones. En algunas de las soluciones propuestas se han utilizado además, técnicas de inferencia basadas en Lógica Difusa con idea de mejorar la calidad de los sistemas de visión resultantes. Para la realización de los diferentes bloques funcionales se ha seguido una metodología de diseño basada en modelos, que ha permitido la realización de todo el ciclo de desarrollo en un único entorno de trabajo. Dicho entorno combina herramientas informáticas que facilitan las etapas de codificación algorítmica, diseño de circuitos, implementación física y verificación funcional y temporal de las distintas alternativas, acelerando con ello todas las fases del flujo de diseño y posibilitando una exploración más eficiente del espacio de posibles soluciones. Asimismo, con el objetivo de demostrar la funcionalidad de las distintas aportaciones de esta Tesis Doctoral, algunas de las soluciones propuestas han sido integradas en sistemas de vídeo reales, que emplean buses estándares de uso común. Los dispositivos seleccionados para llevar a cabo estos demostradores han sido FPGAs y SoPCs de Xilinx, ya que sus excelentes propiedades para el prototipado y la construcción de sistemas que combinan componentes software y hardware, los convierten en candidatos ideales para dar soporte a la implementación de este tipo de sistemas.The continuous evolution of the Information and Communication Technologies (ICT), not only has allowed more than half of the global population to be currently interconnected through Internet, but it has also been the breeding ground for new paradigms such as Internet of Things (ioT) or Ambient Intelligence (AmI). These paradigms expose the need of interconnecting elements with different functionalities in order to achieve a digital, sensitive, adaptive and responsive environment that provides services of distinct nature to the users. The development of low cost devices, with small size, light weight and a high level of autonomy, processing power and ability for interaction is required to obtain this environment. Attending to this last feature, many of these devices will include the capacity to acquire, process and transmit images, extracting, interpreting and modifying the visual information that could be of interest for a certain application. This PhD Thesis, focused on the development of dedicated hardware for the implementation of image and video processing algorithms used in embedded systems, attempts to response to this challenge. The work has a two-fold purpose: on one hand, the search of solutions that, for its performance and properties, could be integrated on systems with strict requirements of functionality, size, power consumption and speed of operation; on the other hand, the design of a set of blocks that, packaged and implemented as IP-modules, allow to alleviate the computational load of the processing units of the systems where they could be integrated. In this Thesis, specific solutions for the implementation of two kinds of usual operations in many computer vision systems are provided. These operations are background subtraction and connected component labelling. Different solutions are created as the result of applying a good performance/cost trade-off (approaching this last criteria in terms of area, speed and consumed power), able to cover a wide range of applications. Inference techniques based on Fuzzy Logic have been applied to some of the proposed solutions in order to improve the quality of the resulting systems. To obtain the mentioned solutions, a model based-design methodology has been applied. This fact has allowed us to carry out all the design flow from a single work environment. That environment combines CAD tools that facilitate the stages of code programming, circuit design, physical implementation and functional and temporal verification of the different algorithms, thus accelerating the overall processes and making it possible to explore the space of solutions. Moreover, aiming to demonstrate the functionality of this PhD Thesis’s contributions, some of the proposed solutions have been integrated on real video systems that employ common and standard buses. The devices selected to perform these demonstrators have been FPGA and SoPCs (manufactured by Xilinx) since, due to their excellent properties for prototyping and creating systems that combine software and hardware components, they are ideal to develop these applications

    Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos

    Get PDF
    International audienceThis work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on handcrafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. Using a frame by frame labeling, we obtain nearly state-of-the-art performance on the NYU-v2 depth dataset with an accuracy of 64.5%. We then show that the labeling can be further improved by exploiting the temporal consistency in the video sequence of the scene. To that goal, we present a method producing temporally consistent superpixels from a streaming video. Among the di erent methods producing superpixel segmentations of an image, the graph-based approach of Felzenszwalb and Huttenlocher is broadly employed. One of its interesting properties is that the regions are computed in a greedy manner in quasi-linear time by using a minimum spanning tree. In a framework exploiting minimum spanning trees all along, we propose an efficient video segmentation approach that computes temporally consistent pixels in a causal manner, filling the need for causal and real-time applications. We illustrate the labeling of indoor scenes in video sequences that could be processed in real-time using appropriate hardware such as an FPGA
    • …
    corecore