212 research outputs found

    ROBUST BACKGROUND SUBTRACTION FOR MOVING CAMERAS AND THEIR APPLICATIONS IN EGO-VISION SYSTEMS

    Get PDF
    Background subtraction is the algorithmic process that segments out the region of interest often known as foreground from the background. Extensive literature and numerous algorithms exist in this domain, but most research have focused on videos captured by static cameras. The proliferation of portable platforms equipped with cameras has resulted in a large amount of video data being generated from moving cameras. This motivates the need for foundational algorithms for foreground/background segmentation in videos from moving cameras. In this dissertation, I propose three new types of background subtraction algorithms for moving cameras based on appearance, motion, and a combination of them. Comprehensive evaluation of the proposed approaches on publicly available test sequences show superiority of our system over state-of-the-art algorithms. The first method is an appearance-based global modeling of foreground and background. Features are extracted by sliding a fixed size window over the entire image without any spatial constraint to accommodate arbitrary camera movements. Supervised learning method is then used to build foreground and background models. This method is suitable for limited scene scenarios such as Pan-Tilt-Zoom surveillance cameras. The second method relies on motion. It comprises of an innovative background motion approximation mechanism followed by spatial regulation through a Mega-Pixel denoising process. This work does not need to maintain any costly appearance models and is therefore appropriate for resource constraint ego-vision systems. The proposed segmentation combined with skin cues is validated by a novel application on authenticating hand-gestured signature captured by wearable cameras. The third method combines both motion and appearance. Foreground probabilities are jointly estimated by motion and appearance. After the mega-pixel denoising process, the probability estimates and gradient image are combined by Graph-Cut to produce the segmentation mask. This method is universal as it can handle all types of moving cameras

    Real-Time Noise Removal in Foveated Path Tracing

    Get PDF
    Path tracing is a method for rendering photorealistic two-dimensional images of three-dimensional scenes based on computing intersection between the scene geometry and light rays traveling through the scene. The rise in parallel computation resources in devices such as graphics processing units (GPUs) have made it more and more viable to do path tracing in real time. To achieve real-time performance, path tracing can be further optimized by using foveated rendering, where the properties of the human visual system are exploited to reduce the number of rays outside the central point of vision (fovea), where the human eye cannot discern fine detail. The reduction in the number of rays can, however, lead to several issues. Noise appears in the image as a result of an inadequate number of path tracing samples allocated to each pixel. Furthermore, the variation in the noise from one animation frame to the next appears as flicker. Finally, artifacts can appear when the spatially subsampled image is upsampled to a uniform resolution for display. In this thesis, solutions to the aforementioned issues are explored by implementing three noise removal methods into a foveated path tracing rendering system. The computational performance and the visual quality of the implemented methods is evaluated. Of the implemented methods, cross-bilateral filter provides the best quality, but its runtime doesn't scale well to large filter sizes. Large filter sizes are enabled by the À-Trous approximation of the cross-bilateral filter, at the cost of generating more artifacts in the result. Overall, while the implemented methods are able to provide visually pleasing results in some scenarios, improvements in the algorithms (e.g., local filter parameter selection) are needed to reach the quality seen in offline methods

    The NIRS Analysis Package: Noise Reduction and Statistical Inference

    Get PDF
    Near infrared spectroscopy (NIRS) is a non-invasive optical imaging technique that can be used to measure cortical hemodynamic responses to specific stimuli or tasks. While analyses of NIRS data are normally adapted from established fMRI techniques, there are nevertheless substantial differences between the two modalities. Here, we investigate the impact of NIRS-specific noise; e.g., systemic (physiological), motion-related artifacts, and serial autocorrelations, upon the validity of statistical inference within the framework of the general linear model. We present a comprehensive framework for noise reduction and statistical inference, which is custom-tailored to the noise characteristics of NIRS. These methods have been implemented in a public domain Matlab toolbox, the NIRS Analysis Package (NAP). Finally, we validate NAP using both simulated and actual data, showing marked improvement in the detection power and reliability of NIRS

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Image Deblurring and Super-resolution by Adaptive Sparse Domain Selection and Adaptive Regularization

    Full text link
    As a powerful statistical image modeling technique, sparse representation has been successfully used in various image restoration applications. The success of sparse representation owes to the development of l1-norm optimization techniques, and the fact that natural images are intrinsically sparse in some domain. The image restoration quality largely depends on whether the employed sparse domain can represent well the underlying image. Considering that the contents can vary significantly across different images or different patches in a single image, we propose to learn various sets of bases from a pre-collected dataset of example image patches, and then for a given patch to be processed, one set of bases are adaptively selected to characterize the local sparse domain. We further introduce two adaptive regularization terms into the sparse representation framework. First, a set of autoregressive (AR) models are learned from the dataset of example image patches. The best fitted AR models to a given patch are adaptively selected to regularize the image local structures. Second, the image non-local self-similarity is introduced as another regularization term. In addition, the sparsity regularization parameter is adaptively estimated for better image restoration performance. Extensive experiments on image deblurring and super-resolution validate that by using adaptive sparse domain selection and adaptive regularization, the proposed method achieves much better results than many state-of-the-art algorithms in terms of both PSNR and visual perception.Comment: 35 pages. This paper is under review in IEEE TI

    Analyzing satellite images by apply deep learning instance segmentation of agricultural fields

    Get PDF
    This novel research focuses on multi-exposure satellite images of agricultural fields using image analysis and deep learning techniques. The development of image edge smoothening system using CNN is in hot pursuit, with special attention being given to the smoothening of all the edges of image. Given its high propensity to meta-size, going hand in hand with severe decreases in preservation rates, and the high inter-edge variability in image appearance, as well as a strong requirement on the training of the physician properly de-noising an image can be considered a daunting task. The purpose of this advance research is to use a deep learning and image analysis pipeline for multi-exposure satellite image for the segmentation of edges in an image using with hybrid techniques in deep learning and imaging. The literature review of different papers was conducted with different imaging model architectures. The CNN custom model was created for the task, and deep learning technique (CNN) was used with different levels of fine tuning of hybrid satellite image analysis techniques. Screening for high edge filter to identify edges at high accuracy has been under debate. The custom deep learning model architectures were designed to represent different depths. Additionally, deep learning CNN model was created to represent traditional automated image analysis approach. The study also attempts to find solutions to practical deep learning challenges such as low training speed and lack of transparency with an accuracy of 98.17% absolutely

    Sparse and low rank approximations for action recognition

    Get PDF
    Action recognition is crucial area of research in computer vision with wide range of applications in surveillance, patient-monitoring systems, video indexing, Human- Computer Interaction and many more. These applications require automated action recognition. Robust classification methods are sought-after despite influential research in this field over past decade. The data resources have grown tremendously owing to the advances in the digital revolution which cannot be compared to the meagre resources in the past. The main limitation on a system when dealing with video data is the computational burden due to large dimensions and data redundancy. Sparse and low rank approximation methods have evolved recently which aim at concise and meaningful representation of data. This thesis explores the application of sparse and low rank approximation methods in the context of video data classification with the following contributions. 1. An approach for solving the problem of action and gesture classification is proposed within the sparse representation domain, effectively dealing with large feature dimensions, 2. Low rank matrix completion approach is proposed to jointly classify more than one action 3. Deep features are proposed for robust classification of multiple actions within matrix completion framework which can handle data deficiencies. This thesis starts with the applicability of sparse representations based classifi- cation methods to the problem of action and gesture recognition. Random projection is used to reduce the dimensionality of the features. These are referred to as compressed features in this thesis. The dictionary formed with compressed features has proved to be efficient for the classification task achieving comparable results to the state of the art. Next, this thesis addresses the more promising problem of simultaneous classifi- cation of multiple actions. This is treated as matrix completion problem under transduction setting. Matrix completion methods are considered as the generic extension to the sparse representation methods from compressed sensing point of view. The features and corresponding labels of the training and test data are concatenated and placed as columns of a matrix. The unknown test labels would be the missing entries in that matrix. This is solved using rank minimization techniques based on the assumption that the underlying complete matrix would be a low rank one. This approach has achieved results better than the state of the art on datasets with varying complexities. This thesis then extends the matrix completion framework for joint classification of actions to handle the missing features besides missing test labels. In this context, deep features from a convolutional neural network are proposed. A convolutional neural network is trained on the training data and features are extracted from train and test data from the trained network. The performance of the deep features has proved to be promising when compared to the state of the art hand-crafted features

    Adaptive Representations for Image Restoration

    Get PDF
    In the �eld of image processing, building good representation models for natural images is crucial for various applications, such as image restora- tion, sampling, segmentation, etc. Adaptive image representation models are designed for describing the intrinsic structures of natural images. In the classical Bayesian inference, this representation is often known as the prior of the intensity distribution of the input image. Early image priors have forms such as total variation norm, Markov Random Fields (MRF), and wavelets. Recently, image priors obtained from machine learning tech- niques tend to be more adaptive, which aims at capturing the natural image models via learning from larger databases. In this thesis, we study adaptive representations of natural images for image restoration. The purpose of image restoration is to remove the artifacts which degrade an image. The degradation comes in many forms such as image blurs, noises, and artifacts from the codec. Take image denoising for an example. There are several classic representation methods which can generate state- of-the-art results. The �rst one is the assumption of image self-similarity. However, this representation has the issue that sometimes the self-similarity assumption would fail because of high noise levels or unique image contents. The second one is the wavelet based nonlocal representation, which also has a problem in that the �xed basis function is not adaptive enough for any arbitrary type of input images. The third is the sparse coding using over- complete dictionaries, which does not have the hierarchical structure that is similar to the one in human visual system and is therefore prone to denoising artifacts. My research started from image denoising. Through the thorough review and evaluation of state-of-the-art denoising methods, it was found that the representation of images is substantially important for the denoising tech- nique. At the same time, an improvement on one of the nonlocal denoising method was proposed, which improves the representation of images by the integration of Gaussian blur, clustering and Rotationally Invariant Block Matching. Enlightened by the successful application of sparse coding in compressive sensing, we exploited the image self-similarity by using a sparse representation based on wavelet coe�cients in a nonlocal and hierarchical way, which generates competitive results compared to the state-of-the-art denoising algorithms. Meanwhile, another adaptive local �lter learned by Genetic Programming (GP) was proposed for e�cient image denoising. In this work, we employed GP to �nd the optimal representations for local im- age patches through training on massive datasets, which yields competitive results compared to state-of-the-art local denoising �lters. After success- fully dealt with the denoising part, we moved to the parameter estimation for image degradation models. For instance, image blur identi�cation uses deep learning, which has recently been proposed as a popular image repre- sentation approach. This work has also been extended to blur estimation based on the fact that the second step of the framework has been replaced with general regression neural network. In a word, in this thesis, spatial cor- relations, sparse coding, genetic programming, deep learning are explored as adaptive image representation models for both image restoration and parameter estimation. We conclude this thesis by considering methods based on machine learning to be the best adaptive representations for natural images. We have shown that they can generate better results than conventional representation mod- els for the tasks of image denoising and deblurring

    Motion-capture-based hand gesture recognition for computing and control

    Get PDF
    This dissertation focuses on the study and development of algorithms that enable the analysis and recognition of hand gestures in a motion capture environment. Central to this work is the study of unlabeled point sets in a more abstract sense. Evaluations of proposed methods focus on examining their generalization to users not encountered during system training. In an initial exploratory study, we compare various classification algorithms based upon multiple interpretations and feature transformations of point sets, including those based upon aggregate features (e.g. mean) and a pseudo-rasterization of the capture space. We find aggregate feature classifiers to be balanced across multiple users but relatively limited in maximum achievable accuracy. Certain classifiers based upon the pseudo-rasterization performed best among tested classification algorithms. We follow this study with targeted examinations of certain subproblems. For the first subproblem, we introduce the a fortiori expectation-maximization (AFEM) algorithm for computing the parameters of a distribution from which unlabeled, correlated point sets are presumed to be generated. Each unlabeled point is assumed to correspond to a target with independent probability of appearance but correlated positions. We propose replacing the expectation phase of the algorithm with a Kalman filter modified within a Bayesian framework to account for the unknown point labels which manifest as uncertain measurement matrices. We also propose a mechanism to reorder the measurements in order to improve parameter estimates. In addition, we use a state-of-the-art Markov chain Monte Carlo sampler to efficiently sample measurement matrices. In the process, we indirectly propose a constrained k-means clustering algorithm. Simulations verify the utility of AFEM against a traditional expectation-maximization algorithm in a variety of scenarios. In the second subproblem, we consider the application of positive definite kernels and the earth mover\u27s distance (END) to our work. Positive definite kernels are an important tool in machine learning that enable efficient solutions to otherwise difficult or intractable problems by implicitly linearizing the problem geometry. We develop a set-theoretic interpretation of ENID and propose earth mover\u27s intersection (EMI). a positive definite analog to ENID. We offer proof of EMD\u27s negative definiteness and provide necessary and sufficient conditions for ENID to be conditionally negative definite, including approximations that guarantee negative definiteness. In particular, we show that ENID is related to various min-like kernels. We also present a positive definite preserving transformation that can be applied to any kernel and can be used to derive positive definite EMD-based kernels, and we show that the Jaccard index is simply the result of this transformation applied to set intersection. Finally, we evaluate kernels based on EMI and the proposed transformation versus ENID in various computer vision tasks and show that END is generally inferior even with indefinite kernel techniques. Finally, we apply deep learning to our problem. We propose neural network architectures for hand posture and gesture recognition from unlabeled marker sets in a coordinate system local to the hand. As a means of ensuring data integrity, we also propose an extended Kalman filter for tracking the rigid pattern of markers on which the local coordinate system is based. We consider fixed- and variable-size architectures including convolutional and recurrent neural networks that accept unlabeled marker input. We also consider a data-driven approach to labeling markers with a neural network and a collection of Kalman filters. Experimental evaluations with posture and gesture datasets show promising results for the proposed architectures with unlabeled markers, which outperform the alternative data-driven labeling method

    Statistical Diffusion Tensor Imaging

    Get PDF
    Magnetic resonance diffusion tensor imaging (DTI) allows to infere the ultrastructure of living tissue. In brain mapping, neural fiber trajectories can be identified by exploiting the anisotropy of diffusion processes. Manifold statistical methods may be linked into the comprehensive processing chain that is spanned between DTI raw images and the reliable visualization of fibers. In this work, a space varying coefficients model (SVCM) using penalized B-splines was developed to integrate diffusion tensor estimation, regularization and interpolation into a unified framework. The implementation challenges originating in multiple 3d space varying coefficient surfaces and the large dimensions of realistic datasets were met by incorporating matrix sparsity and efficient model approximation. Superiority of B-spline based SVCM to the standard approach was demonstrable from simulation studies in terms of the precision and accuracy of the individual tensor elements. The integration with a probabilistic fiber tractography algorithm and application on real brain data revealed that the unified approach is at least equivalent to the serial application of voxelwise estimation, smoothing and interpolation. From the error analysis using boxplots and visual inspection the conclusion was drawn that both the standard approach and the B-spline based SVCM may suffer from low local adaptivity. Therefore, wavelet basis functions were employed for filtering diffusion tensor fields. While excellent local smoothing was indeed achieved by combining voxelwise tensor estimation with wavelet filtering, no immediate improvement was gained for fiber tracking. However, the thresholding strategy needs to be refined and the proposed model of an incorporation of wavelets into an SVCM needs to be implemented to finally assess their utility for DTI data processing. In summary, an SVCM with specific consideration of the demands of human brain DTI data was developed and implemented, eventually representing a unified postprocessing framework. This represents an experimental and statistical platform to further improve the reliability of tractography
    corecore