678 research outputs found
Feature regularization and learning for human activity recognition.
Doctoral Degree. University of KwaZulu-Natal, Durban.Feature extraction is an essential component in the design of human activity
recognition model. However, relying on extracted features alone for learning often makes the model a suboptimal model. Therefore, this research
work seeks to address such potential problem by investigating feature regularization. Feature regularization is used for encapsulating discriminative
patterns that are needed for better and efficient model learning. Firstly, a
within-class subspace regularization approach is proposed for eigenfeatures
extraction and regularization in human activity recognition. In this ap-
proach, the within-class subspace is modelled using more eigenvalues from
the reliable subspace to obtain a four-parameter modelling scheme. This
model enables a better and true estimation of the eigenvalues that are distorted by the small sample size effect. This regularization is done in one
piece, thereby avoiding undue complexity of modelling eigenspectrum differently. The whole eigenspace is used for performance evaluation because
feature extraction and dimensionality reduction are done at a later stage
of the evaluation process. Results show that the proposed approach has
better discriminative capacity than several other subspace approaches for
human activity recognition. Secondly, with the use of likelihood prior probability, a new regularization scheme that improves the loss function of deep
convolutional neural network is proposed. The results obtained from this
work demonstrate that a well regularized feature yields better class discrimination in human activity recognition. The major contribution of the
thesis is the development of feature extraction strategies for determining
discriminative patterns needed for efficient model learning
Sketch-based 3D Shape Retrieval using Convolutional Neural Networks
Retrieving 3D models from 2D human sketches has received considerable
attention in the areas of graphics, image retrieval, and computer vision.
Almost always in state of the art approaches a large amount of "best views" are
computed for 3D models, with the hope that the query sketch matches one of
these 2D projections of 3D models using predefined features.
We argue that this two stage approach (view selection -- matching) is
pragmatic but also problematic because the "best views" are subjective and
ambiguous, which makes the matching inputs obscure. This imprecise nature of
matching further makes it challenging to choose features manually. Instead of
relying on the elusive concept of "best views" and the hand-crafted features,
we propose to define our views using a minimalism approach and learn features
for both sketches and views. Specifically, we drastically reduce the number of
views to only two predefined directions for the whole dataset. Then, we learn
two Siamese Convolutional Neural Networks (CNNs), one for the views and one for
the sketches. The loss function is defined on the within-domain as well as the
cross-domain similarities. Our experiments on three benchmark datasets
demonstrate that our method is significantly better than state of the art
approaches, and outperforms them in all conventional metrics.Comment: CVPR 201
Multimodal Adversarial Learning
Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art
- …