6,326 research outputs found

    Top-down segmentation of non-rigid visual objects using derivative-based search on sparse manifolds

    Get PDF
    The solution for the top-down segmentation of non rigid visual objects using machine learning techniques is generally regarded as too complex to be solved in its full generality given the large dimensionality of the search space of the explicit representation of the segmentation contour. In order to reduce this complexity, the problem is usually divided into two stages: rigid detection and non-rigid segmentation. The rationale is based on the fact that the rigid detection can be run in a lower dimensionality space (i.e., less complex and faster) than the original contour space, and its result is then used to constrain the non-rigid segmentation. In this paper, we propose the use of sparse manifolds to reduce the dimensionality of the rigid detection search space of current state-of-the-art top-down segmentation methodologies. The main goals targeted by this smaller dimensionality search space are the decrease of the search running time complexity and the reduction of the training complexity of the rigid detector. These goals are attainable given that both the search and training complexities are function of the dimensionality of the rigid search space. We test our approach in the segmentation of the left ventricle from ultrasound images and lips from frontal face images. Compared to the performance of state-of-the-art non-rigid segmentation system, our experiments show that the use of sparse manifolds for the rigid detection leads to the two goals mentioned above. © 2013 IEEE.Jacinto C. Nascimento, Gustavo Carneirohttp://www.pamitc.org/cvpr13

    Blending Learning and Inference in Structured Prediction

    Full text link
    In this paper we derive an efficient algorithm to learn the parameters of structured predictors in general graphical models. This algorithm blends the learning and inference tasks, which results in a significant speedup over traditional approaches, such as conditional random fields and structured support vector machines. For this purpose we utilize the structures of the predictors to describe a low dimensional structured prediction task which encourages local consistencies within the different structures while learning the parameters of the model. Convexity of the learning task provides the means to enforce the consistencies between the different parts. The inference-learning blending algorithm that we propose is guaranteed to converge to the optimum of the low dimensional primal and dual programs. Unlike many of the existing approaches, the inference-learning blending allows us to learn efficiently high-order graphical models, over regions of any size, and very large number of parameters. We demonstrate the effectiveness of our approach, while presenting state-of-the-art results in stereo estimation, semantic segmentation, shape reconstruction, and indoor scene understanding

    Object Search Strategy in Tracking Algorithms

    Get PDF
    The demand for real-time video surveillance systems is increasing rapidly. The purpose of these systems includes surveillance as well as monitoring and controlling the events. Today there are several real-time computer vision applications based on image understanding which emulate the human vision and intelligence. These machines include object tracking as their primary task. Object tracking refers to estimating the trajectory of an object of interest in a video. A tracking system works on the principle of video processing algorithms. Video processing includes a huge amount of data to be processed and this fact dictates while implementing the algorithms on any hardware. However, the problems becomes challenging due to unexpected motion of the object, scene appearance change, object appearance change, structures of objects that are not rigid. Besides this full and partial occlusions and motion of the camera also pose challenges. Current tracking algorithms treat this problem as a classification task and use online learning algorithms to update the object model. Here, we explore the data redundancy in the sampling techniques and develop a highly structured kernel. This kernel acquires a circulant structure which is extremely easy to manipulate. Also, we take it further by using mean shift density algorithm and optical flow by Lucas Kanade method which gives us a heavy improvement in the results

    Going Deeper into Action Recognition: A Survey

    Full text link
    Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

    Manifold Learning for Natural Image Sets, Doctoral Dissertation August 2006

    Get PDF
    The field of manifold learning provides powerful tools for parameterizing high-dimensional data points with a small number of parameters when this data lies on or near some manifold. Images can be thought of as points in some high-dimensional image space where each coordinate represents the intensity value of a single pixel. These manifold learning techniques have been successfully applied to simple image sets, such as handwriting data and a statue in a tightly controlled environment. However, they fail in the case of natural image sets, even those that only vary due to a single degree of freedom, such as a person walking or a heart beating. Parameterizing data sets such as these will allow for additional constraints on traditional computer vision problems such as segmentation and tracking. This dissertation explores the reasons why classical manifold learning algorithms fail on natural image sets and proposes new algorithms for parameterizing this type of data

    Algorithmic issues in visual object recognition

    Get PDF
    This thesis is divided into two parts covering two aspects of research in the area of visual object recognition. Part I is about human detection in still images. Human detection is a challenging computer vision task due to the wide variability in human visual appearances and body poses. In this part, we present several enhancements to human detection algorithms. First, we present an extension to the integral images framework to allow for constant time computation of non-uniformly weighted summations over rectangular regions using a bundle of integral images. Such computational element is commonly used in constructing gradient-based feature descriptors, which are the most successful in shape-based human detection. Second, we introduce deformable features as an alternative to the conventional static features used in classifiers based on boosted ensembles. Deformable features can enhance the accuracy of human detection by adapting to pose changes that can be described as translations of body features. Third, we present a comprehensive evaluation framework for cascade-based human detectors. The presented framework facilitates comparison between cascade-based detection algorithms, provides a confidence measure for result, and deploys a practical evaluation scenario. Part II explores the possibilities of enhancing the speed of core algorithms used in visual object recognition using the computing capabilities of Graphics Processing Units (GPUs). First, we present an implementation of Graph Cut on GPUs, which achieves up to 4x speedup against compared to a CPU implementation. The Graph Cut algorithm has many applications related to visual object recognition such as segmentation and 3D point matching. Second, we present an efficient sparse approximation of kernel matrices for GPUs that can significantly speed up kernel based learning algorithms, which are widely used in object detection and recognition. We present an implementation of the Affinity Propagation clustering algorithm based on this representation, which is about 6 times faster than another GPU implementation based on a conventional sparse matrix representation

    Translating AI to digital pathology workflow: Dealing with scarce data and high variation by minimising complexities in data and models

    Get PDF
    The recent conversion to digital pathology using Whole Slide Images (WSIs) from conventional pathology opened the doors for Artificial Intelligence (AI) in pathology workflow. The recent interests in machine learning and deep learning have gained a high interest in medical image processing. However, WSIs differ from generic medical images. WSIs are complex images which can reveal various information to support different diagnosis varying from cancer to unknown underlying conditions which were not discovered in other medical investigations. These investigations require expert knowledge spending a long time for investigations, applying different stains to the WSIs, and comparing the WSIs. Differences in WSI differentiate general machine learning methods that are applied for medical image processing. Co-analysing multistained WSIs, high variation of the WSIs from different sites, and lack of labelled data are the main key interest areas that directly influence in developing machine learning models that support pathologists in their investigations. However, most of the state-ofthe- art machine learning approaches cannot be applied in the general clinical workflow without using high compute power, expert knowledge, and time. Therefore, this thesis explores avenues to translate the highly computational and time intensive model to a clinical workflow. Co-analysing multi-stained WSIs require registering differently stained WSI together. In order to get a high precision in the registration exploring nonrigid and rigid transformation is required. The non-rigid transformation requires complex deep learning approaches. Using super-convergence on a small Convolutional Neural Network model it is possible to achieve high precision compared to larger auto-encoders and other state-of-the-art models. High variation of the WSIs from different sites heavily effect machine learning models in their predictions. The thesis presents an approach of using a pre-trained model by using only a small number of samples from the new site. Therefore, re-training larger deep learning models are not required which saves expert time for re-labelling and computational power. Finally, lack of labelled data is one of the main issues in training any supervised machine learning or deep learning model. Using a Generative Adversarial Networks (GAN) is an approach which can be easily implemented to avoid this issue. However, GANs are time and computationally expensive. These are not applicable in a general clinical workflow. Therefore, this thesis presents an approach using a simpler GANthat can generate accurate sample labelled data. The synthetic data are used to train classifier and the thesis demonstrates that the predictive model can generate higher accuracy in the test environment. This thesis demonstrates that machine learning and deep learning models can be applied to a clinical workflow, without exploiting expert time and high computing power
    corecore