212 research outputs found
ROBUST BACKGROUND SUBTRACTION FOR MOVING CAMERAS AND THEIR APPLICATIONS IN EGO-VISION SYSTEMS
Background subtraction is the algorithmic process that segments out the region of interest often known as foreground from the background. Extensive literature and numerous algorithms exist in this domain, but most research have focused on videos captured by static cameras. The proliferation of portable platforms equipped with cameras has resulted in a large amount of video data being generated from moving cameras. This motivates the need for foundational algorithms for foreground/background segmentation in videos from moving cameras. In this dissertation, I propose three new types of background subtraction algorithms for moving cameras based on appearance, motion, and a combination of them. Comprehensive evaluation of the proposed approaches on publicly available test sequences show superiority of our system over state-of-the-art algorithms.
The first method is an appearance-based global modeling of foreground and background. Features are extracted by sliding a fixed size window over the entire image without any spatial constraint to accommodate arbitrary camera movements. Supervised learning method is then used to build foreground and background models. This method is suitable for limited scene scenarios such as Pan-Tilt-Zoom surveillance cameras. The second method relies on motion. It comprises of an innovative background motion approximation mechanism followed by spatial regulation through a Mega-Pixel denoising process. This work does not need to maintain any costly appearance models and is therefore appropriate for resource constraint ego-vision systems. The proposed segmentation combined with skin cues is validated by a novel application on authenticating hand-gestured signature captured by wearable cameras. The third method combines both motion and appearance. Foreground probabilities are jointly estimated by motion and appearance. After the mega-pixel denoising process, the probability estimates and gradient image are combined by Graph-Cut to produce the segmentation mask. This method is universal as it can handle all types of moving cameras
Real-Time Noise Removal in Foveated Path Tracing
Path tracing is a method for rendering photorealistic two-dimensional images of three-dimensional scenes based on computing intersection between the scene geometry and light rays traveling through the scene. The rise in parallel computation resources in devices such as graphics processing units (GPUs) have made it more and more viable to do path tracing in real time. To achieve real-time performance, path tracing can be further optimized by using foveated rendering, where the properties of the human visual system are exploited to reduce the number of rays outside the central point of vision (fovea), where the human eye cannot discern fine detail.
The reduction in the number of rays can, however, lead to several issues. Noise appears in the image as a result of an inadequate number of path tracing samples allocated to each pixel. Furthermore, the variation in the noise from one animation frame to the next appears as flicker. Finally, artifacts can appear when the spatially subsampled image is upsampled to a uniform resolution for display.
In this thesis, solutions to the aforementioned issues are explored by implementing three noise removal methods into a foveated path tracing rendering system. The computational performance and the visual quality of the implemented methods is evaluated. Of the implemented methods, cross-bilateral filter provides the best quality, but its runtime doesn't scale well to large filter sizes. Large filter sizes are enabled by the À-Trous approximation of the cross-bilateral filter, at the cost of generating more artifacts in the result. Overall, while the implemented methods are able to provide visually pleasing results in some scenarios, improvements in the algorithms (e.g., local filter parameter selection) are needed to reach the quality seen in offline methods
The NIRS Analysis Package: Noise Reduction and Statistical Inference
Near infrared spectroscopy (NIRS) is a non-invasive optical imaging technique that can be used to measure cortical hemodynamic responses to specific stimuli or tasks. While analyses of NIRS data are normally adapted from established fMRI techniques, there are nevertheless substantial differences between the two modalities. Here, we investigate the impact of NIRS-specific noise; e.g., systemic (physiological), motion-related artifacts, and serial autocorrelations, upon the validity of statistical inference within the framework of the general linear model. We present a comprehensive framework for noise reduction and statistical inference, which is custom-tailored to the noise characteristics of NIRS. These methods have been implemented in a public domain Matlab toolbox, the NIRS Analysis Package (NAP). Finally, we validate NAP using both simulated and actual data, showing marked improvement in the detection power and reliability of NIRS
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Image Deblurring and Super-resolution by Adaptive Sparse Domain Selection and Adaptive Regularization
As a powerful statistical image modeling technique, sparse representation has
been successfully used in various image restoration applications. The success
of sparse representation owes to the development of l1-norm optimization
techniques, and the fact that natural images are intrinsically sparse in some
domain. The image restoration quality largely depends on whether the employed
sparse domain can represent well the underlying image. Considering that the
contents can vary significantly across different images or different patches in
a single image, we propose to learn various sets of bases from a pre-collected
dataset of example image patches, and then for a given patch to be processed,
one set of bases are adaptively selected to characterize the local sparse
domain. We further introduce two adaptive regularization terms into the sparse
representation framework. First, a set of autoregressive (AR) models are
learned from the dataset of example image patches. The best fitted AR models to
a given patch are adaptively selected to regularize the image local structures.
Second, the image non-local self-similarity is introduced as another
regularization term. In addition, the sparsity regularization parameter is
adaptively estimated for better image restoration performance. Extensive
experiments on image deblurring and super-resolution validate that by using
adaptive sparse domain selection and adaptive regularization, the proposed
method achieves much better results than many state-of-the-art algorithms in
terms of both PSNR and visual perception.Comment: 35 pages. This paper is under review in IEEE TI
Analyzing satellite images by apply deep learning instance segmentation of agricultural fields
This novel research focuses on multi-exposure satellite images of agricultural fields using image analysis and deep learning techniques. The development of image edge smoothening system using CNN is in hot pursuit, with special attention being given to the smoothening of all the edges of image. Given its high propensity to meta-size, going hand in hand with severe decreases in preservation rates, and the high inter-edge variability in image appearance, as well as a strong requirement on the training of the physician properly de-noising an image can be considered a daunting task. The purpose of this advance research is to use a deep learning and image analysis pipeline for multi-exposure satellite image for the segmentation of edges in an image using with hybrid techniques in deep learning and imaging. The literature review of different papers was conducted with different imaging model architectures. The CNN custom model was created for the task, and deep learning technique (CNN) was used with different levels of fine tuning of hybrid satellite image analysis techniques. Screening for high edge filter to identify edges at high accuracy has been under debate. The custom deep learning model architectures were designed to represent different depths. Additionally, deep learning CNN model was created to represent traditional automated image analysis approach. The study also attempts to find solutions to practical deep learning challenges such as low training speed and lack of transparency with an accuracy of 98.17% absolutely
Sparse and low rank approximations for action recognition
Action recognition is crucial area of research in computer vision with wide range of
applications in surveillance, patient-monitoring systems, video indexing, Human-
Computer Interaction and many more. These applications require automated
action recognition. Robust classification methods are sought-after despite influential
research in this field over past decade. The data resources have grown
tremendously owing to the advances in the digital revolution which cannot be
compared to the meagre resources in the past. The main limitation on a system
when dealing with video data is the computational burden due to large dimensions
and data redundancy. Sparse and low rank approximation methods have evolved
recently which aim at concise and meaningful representation of data. This thesis
explores the application of sparse and low rank approximation methods in the
context of video data classification with the following contributions.
1. An approach for solving the problem of action and gesture classification is
proposed within the sparse representation domain, effectively dealing with
large feature dimensions,
2. Low rank matrix completion approach is proposed to jointly classify more
than one action
3. Deep features are proposed for robust classification of multiple actions
within matrix completion framework which can handle data deficiencies.
This thesis starts with the applicability of sparse representations based classifi-
cation methods to the problem of action and gesture recognition. Random projection
is used to reduce the dimensionality of the features. These are referred
to as compressed features in this thesis. The dictionary formed with compressed
features has proved to be efficient for the classification task achieving comparable
results to the state of the art.
Next, this thesis addresses the more promising problem of simultaneous classifi-
cation of multiple actions. This is treated as matrix completion problem under
transduction setting. Matrix completion methods are considered as the generic
extension to the sparse representation methods from compressed sensing point
of view. The features and corresponding labels of the training and test data are
concatenated and placed as columns of a matrix. The unknown test labels would
be the missing entries in that matrix. This is solved using rank minimization
techniques based on the assumption that the underlying complete matrix would
be a low rank one. This approach has achieved results better than the state of the art on datasets with varying complexities.
This thesis then extends the matrix completion framework for joint classification
of actions to handle the missing features besides missing test labels. In
this context, deep features from a convolutional neural network are proposed.
A convolutional neural network is trained on the training data and features are
extracted from train and test data from the trained network. The performance
of the deep features has proved to be promising when compared to the state of
the art hand-crafted features
Adaptive Representations for Image Restoration
In the �eld of image processing, building good representation models for
natural images is crucial for various applications, such as image restora-
tion, sampling, segmentation, etc. Adaptive image representation models
are designed for describing the intrinsic structures of natural images. In
the classical Bayesian inference, this representation is often known as the
prior of the intensity distribution of the input image. Early image priors
have forms such as total variation norm, Markov Random Fields (MRF),
and wavelets. Recently, image priors obtained from machine learning tech-
niques tend to be more adaptive, which aims at capturing the natural image
models via learning from larger databases. In this thesis, we study adaptive
representations of natural images for image restoration.
The purpose of image restoration is to remove the artifacts which degrade
an image. The degradation comes in many forms such as image blurs,
noises, and artifacts from the codec. Take image denoising for an example.
There are several classic representation methods which can generate state-
of-the-art results. The �rst one is the assumption of image self-similarity.
However, this representation has the issue that sometimes the self-similarity
assumption would fail because of high noise levels or unique image contents.
The second one is the wavelet based nonlocal representation, which also has
a problem in that the �xed basis function is not adaptive enough for any
arbitrary type of input images. The third is the sparse coding using over-
complete dictionaries, which does not have the hierarchical structure that is
similar to the one in human visual system and is therefore prone to denoising
artifacts.
My research started from image denoising. Through the thorough review
and evaluation of state-of-the-art denoising methods, it was found that the representation of images is substantially important for the denoising tech-
nique. At the same time, an improvement on one of the nonlocal denoising
method was proposed, which improves the representation of images by the
integration of Gaussian blur, clustering and Rotationally Invariant Block
Matching. Enlightened by the successful application of sparse coding in
compressive sensing, we exploited the image self-similarity by using a sparse
representation based on wavelet coe�cients in a nonlocal and hierarchical
way, which generates competitive results compared to the state-of-the-art
denoising algorithms. Meanwhile, another adaptive local �lter learned by
Genetic Programming (GP) was proposed for e�cient image denoising. In
this work, we employed GP to �nd the optimal representations for local im-
age patches through training on massive datasets, which yields competitive
results compared to state-of-the-art local denoising �lters. After success-
fully dealt with the denoising part, we moved to the parameter estimation
for image degradation models. For instance, image blur identi�cation uses
deep learning, which has recently been proposed as a popular image repre-
sentation approach. This work has also been extended to blur estimation
based on the fact that the second step of the framework has been replaced
with general regression neural network. In a word, in this thesis, spatial cor-
relations, sparse coding, genetic programming, deep learning are explored
as adaptive image representation models for both image restoration and
parameter estimation.
We conclude this thesis by considering methods based on machine learning
to be the best adaptive representations for natural images. We have shown
that they can generate better results than conventional representation mod-
els for the tasks of image denoising and deblurring
Motion-capture-based hand gesture recognition for computing and control
This dissertation focuses on the study and development of algorithms that enable the analysis and recognition of hand gestures in a motion capture environment. Central to this work is the study of unlabeled point sets in a more abstract sense. Evaluations of proposed methods focus on examining their generalization to users not encountered during system training.
In an initial exploratory study, we compare various classification algorithms based upon multiple interpretations and feature transformations of point sets, including those based upon aggregate features (e.g. mean) and a pseudo-rasterization of the capture space. We find aggregate feature classifiers to be balanced across multiple users but relatively limited in maximum achievable accuracy. Certain classifiers based upon the pseudo-rasterization performed best among tested classification algorithms. We follow this study with targeted examinations of certain subproblems.
For the first subproblem, we introduce the a fortiori expectation-maximization (AFEM) algorithm for computing the parameters of a distribution from which unlabeled, correlated point sets are presumed to be generated. Each unlabeled point is assumed to correspond to a target with independent probability of appearance but correlated positions. We propose replacing the expectation phase of the algorithm with a Kalman filter modified within a Bayesian framework to account for the unknown point labels which manifest as uncertain measurement matrices. We also propose a mechanism to reorder the measurements in order to improve parameter estimates. In addition, we use a state-of-the-art Markov chain Monte Carlo sampler to efficiently sample measurement matrices. In the process, we indirectly propose a constrained k-means clustering algorithm. Simulations verify the utility of AFEM against a traditional expectation-maximization algorithm in a variety of scenarios.
In the second subproblem, we consider the application of positive definite kernels and the earth mover\u27s distance (END) to our work. Positive definite kernels are an important tool in machine learning that enable efficient solutions to otherwise difficult or intractable problems by implicitly linearizing the problem geometry. We develop a set-theoretic interpretation of ENID and propose earth mover\u27s intersection (EMI). a positive definite analog to ENID. We offer proof of EMD\u27s negative definiteness and provide necessary and sufficient conditions for ENID to be conditionally negative definite, including approximations that guarantee negative definiteness. In particular, we show that ENID is related to various min-like kernels. We also present a positive definite preserving transformation that can be applied to any kernel and can be used to derive positive definite EMD-based kernels, and we show that the Jaccard index is simply the result of this transformation applied to set intersection. Finally, we evaluate kernels based on EMI and the proposed transformation versus ENID in various computer vision tasks and show that END is generally inferior even with indefinite kernel techniques.
Finally, we apply deep learning to our problem. We propose neural network architectures for hand posture and gesture recognition from unlabeled marker sets in a coordinate system local to the hand. As a means of ensuring data integrity, we also propose an extended Kalman filter for tracking the rigid pattern of markers on which the local coordinate system is based. We consider fixed- and variable-size architectures including convolutional and recurrent neural networks that accept unlabeled marker input. We also consider a data-driven approach to labeling markers with a neural network and a collection of Kalman filters. Experimental evaluations with posture and gesture datasets show promising results for the proposed architectures with unlabeled markers, which outperform the alternative data-driven labeling method
Statistical Diffusion Tensor Imaging
Magnetic resonance diffusion tensor imaging (DTI) allows to infere the ultrastructure of living tissue. In brain mapping, neural fiber trajectories can be identified by exploiting the anisotropy of diffusion processes. Manifold statistical methods may be linked into the comprehensive processing chain that is spanned between DTI raw images and the reliable visualization of fibers. In this work, a space varying coefficients model (SVCM) using penalized B-splines was developed to integrate diffusion tensor estimation, regularization and interpolation into a unified framework. The implementation challenges originating in multiple 3d space varying coefficient surfaces and the large dimensions of realistic datasets were met by incorporating matrix sparsity and efficient model approximation. Superiority of B-spline based SVCM to the standard approach was demonstrable from simulation studies in terms of the precision and accuracy of the individual tensor elements. The integration with a probabilistic fiber tractography algorithm and application on real brain data revealed that the unified approach is at least equivalent to the serial application of voxelwise estimation, smoothing and interpolation. From the error analysis using boxplots and visual inspection the conclusion was drawn that both the standard approach and the B-spline based SVCM may suffer from low local adaptivity. Therefore, wavelet basis functions were employed for filtering diffusion tensor fields. While excellent local smoothing was indeed achieved by combining voxelwise tensor estimation with wavelet filtering, no immediate improvement was gained for fiber tracking. However, the thresholding strategy needs to be refined and the proposed model of an incorporation of wavelets into an SVCM needs to be implemented to finally assess their utility for DTI data processing.
In summary, an SVCM with specific consideration of the demands of human brain DTI data was developed and implemented, eventually representing a unified postprocessing framework. This represents an experimental and statistical platform to further improve the reliability of tractography
- …