Search CORE

99 research outputs found

Sparse Representation-based Image Quality Assessment

Author: Guha Tanaya
Nezhadarya Ehsan
Ward Rabab K
Publication venue: 'Elsevier BV'
Publication date: 12/06/2013
Field of study

A successful approach to image quality assessment involves comparing the structural information between a distorted and its reference image. However, extracting structural information that is perceptually important to our visual system is a challenging task. This paper addresses this issue by employing a sparse representation-based approach and proposes a new metric called the \emph{sparse representation-based quality} (SPARQ) \emph{index}. The proposed method learns the inherent structures of the reference image as a set of basis vectors, such that any structure in the image can be represented by a linear combination of only a few of those basis vectors. This sparse strategy is employed because it is known to generate basis vectors that are qualitatively similar to the receptive field of the simple cells present in the mammalian primary visual cortex. The visual quality of the distorted image is estimated by comparing the structures of the reference and the distorted images in terms of the learnt basis vectors resembling cortical cells. Our approach is evaluated on six publicly available subject-rated image quality assessment datasets. The proposed SPARQ index consistently exhibits high correlation with the subjective ratings on all datasets and performs better or at par with the state-of-the-art.Comment: 10 pages, 3 figures, 3 tables, submitted to a journa

arXiv.org e-Print Archive

Crossref

On the role of head motion in affective expression

Author: Guha Tanaya
Samanta Atanu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/06/2017
Field of study

Non-verbal behavioral cues, such as head movement, play a significant role in human communication and affective expression. Although facial expression and gestures have been extensively studied in the context of emotion understanding, the head motion (which accompany both) is relatively less understood. This paper studies the significance of head movement in adult's affect communication using videos from movies. These videos are taken from the Acted Facial Expression in the Wild (AFEW) database and are labeled with seven basic emotion categories: anger, disgust, fear, joy, neutral, sadness, and surprise. Considering human head as a rigid body, we estimate the head pose at each video frame in terms of the three Euler angles, and obtain a time-series representation of head motion. First, we investigate the importance of the energy of angular head motion dynamics (displacement, velocity and acceleration) in discriminating among emotions. Next, we analyze the temporal variation of head motion by fitting an autoregressive model to the head motion time series. We observe that head motion carries sufficient information to distinguish any emotion from the rest with high accuracy and this information is complementary to that of facial expression as it helps improve emotion recognition accuracy

Crossref

Warwick Research Archives Portal Repository

Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking

Author: Guha Tanaya
Sharma Gaurav
Sharma Rahul
Publication venue
Publication date: 21/07/2017
Field of study

Public speaking is an important aspect of human communication and interaction. The majority of computational work on public speaking concentrates on analyzing the spoken content, and the verbal behavior of the speakers. While the success of public speaking largely depends on the content of the talk, and the verbal behavior, non-verbal (visual) cues, such as gestures and physical appearance also play a significant role. This paper investigates the importance of visual cues by estimating their contribution towards predicting the popularity of a public lecture. For this purpose, we constructed a large database of more than

1800

TED talk videos. As a measure of popularity of the TED talks, we leverage the corresponding (online) viewers' ratings from YouTube. Visual cues related to facial and physical appearance, facial expressions, and pose variations are extracted from the video frames using convolutional neural network (CNN) models. Thereafter, an attention-based long short-term memory (LSTM) network is proposed to predict the video popularity from the sequence of visual features. The proposed network achieves state-of-the-art prediction accuracy indicating that visual cues alone contain highly predictive information about the popularity of a talk. Furthermore, our network learns a human-like attention mechanism, which is particularly useful for interpretability, i.e. how attention varies with time, and across different visual cues by indicating their relative importance

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Learning spontaneity to improve emotion recognition in speech

Author: Guha Tanaya
Mangalam Karttikeya
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2018
Field of study

We investigate the effect and usefulness of spontaneity (i.e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition. We hypothesize that emotional content in speech is interrelated with its spontaneity, and use spontaneity classification as an auxiliary task to the problem of emotion recognition. We propose two supervised learning settings that utilize spontaneity to improve speech emotion recognition: a hierarchical model that performs spontaneity detection before performing emotion recognition, and a multitask learning model that jointly learns to recognize both spontaneity and emotion. Through various experiments on the well known IEMOCAP database, we show that by using spontaneity detection as an additional task, significant improvement can be achieved over emotion recognition systems that are unaware of spontaneity. We achieve state-of-the-art emotion recognition accuracy (4-class, 69.1%) on the IEMOCAP database outperforming several relevant and competitive baselines

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

A trajectory clustering approach to crowd flow segmentation in videos

Author: Guha Tanaya
Sharma Rahul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This work proposes a trajectory clustering-based approach for segmenting flow patterns in high density crowd videos. The goal is to produce a pixel-wise segmentation of a video sequence (static camera), where each segment corresponds to a different motion pattern. Unlike previous studies that use only motion vectors, we extract full trajectories so as to capture the complete temporal evolution of each region (block) in a video sequence. The extracted trajectories are dense, complex and often overlapping. A novel clustering algorithm is developed to group these trajectories that takes into account the information about the trajectories’ shape, location, and the density of trajectory patterns in a spatial neighborhood. Once the trajectories are clustered, final motion segments are obtained by grouping of the resulting trajectory clusters on the basis of their area of overlap, and average flow direction. The proposed method is validated on a set of crowd videos that are commonly used in this field. On comparison with several state-of-the-art techniques, our method achieves better overall accuracy

Crossref

Warwick Research Archives Portal Repository

An online algorithm for constrained face clustering in videos

Author: Guha Tanaya
Kulshreshtha Prakhar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/09/2018
Field of study

We address the problem of face clustering in long, real world videos. This is a challenging task because faces in such videos exhibit wide variability in scale, pose, illumination, expressions, and may also be partially occluded. The majority of the existing face clustering algorithms are offline, i.e., they assume the availability of the entire data at once. However, in many practical scenarios, complete data may not be available at the same time or may be too large to process or may exhibit significant variation in the data distribution over time. We propose an online clustering algorithm that processes data sequentially in short segments of variable length. The faces detected in each segment are either assigned to an existing cluster or are used to create a new one. Our algorithm uses several spatiotemporal constraints, and a convolutional neural network (CNN) to obtain a robust representation of the faces in order to achieve high clustering accuracy on two benchmark video databases (82.1 % and 93.8%). Despite being an online method (usually known to have lower accuracy), our algorithm achieves comparable or better results than state-of-the-art offline and online methods

Crossref

Warwick Research Archives Portal Repository

Multi-camera trajectory forecasting : pedestrian trajectory prediction in a network of cameras

Author: Guha Tanaya
Kot Alex
Sanchez Silva Victor
Styles Olly
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

We introduce the task of multi-camera trajectory forecasting (MCTF), where the future trajectory of an object is predicted in a network of cameras. Prior works consider forecasting trajectories in a single camera view. Our work is the first to consider the challenging scenario of forecasting across multiple non-overlapping camera views. This has wide applicability in tasks such as re-identification and multi-target multi-camera tracking. To facilitate research in this new area, we release the Warwick-NTU Multi-camera Forecasting Database (WNMF), a unique dataset of multi-camera pedestrian trajectories from a network of 15 synchronized cameras. To accurately label this large dataset (600 hours of video footage), we also develop a semi-automated annotation method. An effective MCTF model should proactively anticipate where and when a person will re-appear in the camera network. In this paper, we consider the task of predicting the next camera a pedestrian will re-appear after leaving the view of another camera, and present several baseline approaches for this. The labeled database is available online https://github.com/olly-styles/Multi-Camera-Trajectory-Forecastin

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

DR-NTU (Digital Repository of NTU)

A dynamic latent variable model for source separation

Author: Ghosh Prasanta
Guha Tanaya
Kumar Anurendra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

We propose a novel latent variable model for learning latent bases for time-varying non-negative data. Our model uses a mixture multinomial as the likelihood function and proposes a Dirichlet distribution with dynamic parameters as a prior, which we call the dynamic Dirichlet prior. An expectation maximization (EM) algorithm is developed for estimating the parameters of the proposed model. Furthermore, we connect our proposed dynamic Dirichlet latent variable model (dynamic DLVM) to the two popular latent basis learning methods - probabilistic latent component analysis (PLCA) and non-negative matrix factorization (NMF). We show that (i) PLCA is a special case of the dynamic DLVM, and (ii) dynamic DLVM can be interpreted as a dynamic version of NMF. The effectiveness of the proposed model is demonstrated through extensive experiments on speaker source separation, and speech-noise separation. In both cases, our method performs better than relevant and competitive baselines. For speaker separation, dynamic DLVM shows 1.38 dB improvement in terms of source to interference ratio, and 1 dB improvement in source to artifact ratio

Crossref

Warwick Research Archives Portal Repository

Variational recurrent sequence-to-sequence retrieval for stepwise illustration

Author: Batra Vishwas
Ferhatosmanoglu Hakan
Guha Tanaya
Haldar Aparajita
He Yulan
Vogiatzis George
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/04/2020
Field of study

We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context. We propose a novel variational recurrent seq2seq (VRSS) retrieval model for this seq2seq task. Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context. This synthetic image embedding point associated with every text embedding point can then be employed for either image generation or image retrieval as desired. We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. To this end, we build and release a new Stepwise Recipe dataset for research purposes, containing 10K recipes (sequences of image-text pairs) having a total of 67K image-text pairs. To our knowledge, it is the first publicly available dataset to offer rich semantic descriptions in a focused category such as food or recipes. Our model is shown to outperform several competitive and relevant baselines in the experiments. We also provide qualitative analysis of how semantically meaningful the results produced by our model are through human evaluation and comparison with relevant existing methods

Keele Research Repository

Warwick Research Archives Portal Repository

Music tempo estimation using sub-band synchrony

Author: Chowdhury Shreyan
Guha Tanaya
Hegde Rajesh M.
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2017
Field of study

Tempo estimation aims at estimating the pace of a musical piece measured in beats per minute. This paper presents a new tempo estimation method that utilizes coherent energy changes across multiple frequency sub-bands to identify the onsets. A new measure, called the sub-band synchrony, is proposed to detect and quantify the coherent amplitude changes across multiple sub-bands. Given a musical piece, our method first detects the onsets using the sub-band synchrony measure. The periodicity of the resulting onset curve, measured using the autocorrelation function, is used to estimate the tempo value. The performance of the sub-band synchrony based tempo estimation method is evaluated on two music databases. Experimental results indicate a reasonable improvement in performance when compared to conventional methods of tempo estimation

Crossref

Warwick Research Archives Portal Repository