473 research outputs found
Learning Sparse Adversarial Dictionaries For Multi-Class Audio Classification
Audio events are quite often overlapping in nature, and more prone to noise
than visual signals. There has been increasing evidence for the superior
performance of representations learned using sparse dictionaries for
applications like audio denoising and speech enhancement. This paper
concentrates on modifying the traditional reconstructive dictionary learning
algorithms, by incorporating a discriminative term into the objective function
in order to learn class-specific adversarial dictionaries that are good at
representing samples of their own class at the same time poor at representing
samples belonging to any other class. We quantitatively demonstrate the
effectiveness of our learned dictionaries as a stand-alone solution for both
binary as well as multi-class audio classification problems.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017
Deep Decision Trees for Discriminative Dictionary Learning with Adversarial Multi-Agent Trajectories
With the explosion in the availability of spatio-temporal tracking data in
modern sports, there is an enormous opportunity to better analyse, learn and
predict important events in adversarial group environments. In this paper, we
propose a deep decision tree architecture for discriminative dictionary
learning from adversarial multi-agent trajectories. We first build up a
hierarchy for the tree structure by adding each layer and performing feature
weight based clustering in the forward pass. We then fine tune the player role
weights using back propagation. The hierarchical architecture ensures the
interpretability and the integrity of the group representation. The resulting
architecture is a decision tree, with leaf-nodes capturing a dictionary of
multi-agent group interactions. Due to the ample volume of data available, we
focus on soccer tracking data, although our approach can be used in any
adversarial multi-agent domain. We present applications of proposed method for
simulating soccer games as well as evaluating and quantifying team strategies.Comment: To appear in 4th International Workshop on Computer Vision in Sports
(CVsports) at CVPR 201
Deep Learning frameworks for Image Quality Assessment
Technology is advancing by the arrival of deep learning and it finds huge application in image
processing also. Deep learning itself sufficient to perform over all the statistical methods. As a
research work, I implemented image quality assessment techniques using deep learning. Here I
proposed two full reference image quality assessment algorithms and two no reference image quality
algorithms. Among the two algorithms on each method, one is in a supervised manner and other is
in an unsupervised manner.
First proposed method is the full reference image quality assessment using autoencoder. Existing
literature shows that statistical features of pristine images will get distorted in presence of distortion.
It will be more advantageous if algorithm itself learns the distortion discriminating features. It will
be more complex if the feature length is more. So autoencoder is trained using a large number of
pristine images. An autoencoder will give the best lower dimensional representation of the input.
It is showed that encoded distance features have good distortion discrimination properties. The
proposed algorithm delivers competitive performance over standard databases.
If we are giving both reference and distorted images to the model and the model learning itself
and gives the scores will reduce the load of extracting features and doing post-processing. But model
should be capable one for discriminating the features by itself. Second method which I proposed is
a full reference and no reference image quality assessment using deep convolutional neural networks.
A network is trained in a supervised manner with subjective scores as targets. The algorithm is
performing e�ciently for the distortions that are learned while training the model.
Last proposed method is a classiffication based no reference image quality assessment. Distortion
level in an image may vary from one region to another region. We may not be able to view distortion
in some part but it may be present in other parts. A classiffication model is able to tell whether a
given input patch is of low quality or high quality. It is shown that aggregate of the patch quality
scores is having a high correlation with the subjective scores
Robust subspace learning for static and dynamic affect and behaviour modelling
Machine analysis of human affect and behavior in naturalistic contexts has witnessed a growing attention in the last decade from various disciplines ranging from social and cognitive sciences to machine learning and computer vision. Endowing machines with the ability to seamlessly detect, analyze, model, predict as well as simulate and synthesize manifestations of internal emotional and behavioral states in real-world data is deemed essential for the deployment of next-generation, emotionally- and socially-competent human-centered interfaces. In this thesis, we are primarily motivated by the problem of modeling, recognizing and predicting spontaneous expressions of non-verbal human affect and behavior manifested through either low-level facial attributes in static images or high-level semantic events in image sequences. Both visual data and annotations of naturalistic affect and behavior naturally contain noisy measurements of unbounded magnitude at random locations, commonly referred to as ‘outliers’. We present here machine learning methods that are robust to such gross, sparse noise. First, we deal with static analysis of face images, viewing the latter as a superposition of mutually-incoherent, low-complexity components corresponding to facial attributes, such as facial identity, expressions and activation of atomic facial muscle actions. We develop a robust, discriminant dictionary learning framework to extract these components from grossly corrupted training data and combine it with sparse representation to recognize the associated attributes. We demonstrate that our framework can jointly address interrelated classification tasks such as face and facial expression recognition. Inspired by the well-documented importance of the temporal aspect in perceiving affect and behavior, we direct the bulk of our research efforts into continuous-time modeling of dimensional affect and social behavior. Having identified a gap in the literature which is the lack of data containing annotations of social attitudes in continuous time and scale, we first curate a new audio-visual database of multi-party conversations from political debates annotated frame-by-frame in terms of real-valued conflict intensity and use it to conduct the first study on continuous-time conflict intensity estimation. Our experimental findings corroborate previous evidence indicating the inability of existing classifiers in capturing the hidden temporal structures of affective and behavioral displays. We present here a novel dynamic behavior analysis framework which models temporal dynamics in an explicit way, based on the natural assumption that continuous- time annotations of smoothly-varying affect or behavior can be viewed as outputs of a low-complexity linear dynamical system when behavioral cues (features) act as system inputs. A novel robust structured rank minimization framework is proposed to estimate the system parameters in the presence of gross corruptions and partially missing data. Experiments on prediction of dimensional conflict and affect as well as multi-object tracking from detection validate the effectiveness of our predictive framework and demonstrate that for the first time that complex human behavior and affect can be learned and predicted based on small training sets of person(s)-specific observations.Open Acces
- …