Search CORE

162,857 research outputs found

Single Image Action Recognition by Predicting Space-Time Saliency

Author: Foroosh Hassan
Safaei Marjaneh
Publication venue
Publication date: 12/05/2017
Field of study

We propose a novel approach based on deep Convolutional Neural Networks (CNN) to recognize human actions in still images by predicting the future motion, and detecting the shape and location of the salient parts of the image. We make the following major contributions to this important area of research: (i) We use the predicted future motion in the static image (Walker et al., 2015) as a means of compensating for the missing temporal information, while using the saliency map to represent the the spatial information in the form of location and shape of what is predicted as significant. (ii) We cast action classification in static images as a domain adaptation problem by transfer learning. We first map the input static image to a new domain that we refer to as the Predicted Optical Flow-Saliency Map domain (POF-SM), and then fine-tune the layers of a deep CNN model trained on classifying the ImageNet dataset to perform action classification in the POF-SM domain. (iii) We tested our method on the popular Willow dataset. But unlike existing methods, we also tested on a more realistic and challenging dataset of over 2M still images that we collected and labeled by taking random frames from the UCF-101 video dataset. We call our dataset the UCF Still Image dataset or UCFSI-101 in short. Our results outperform the state of the art

arXiv.org e-Print Archive

Human Action Recognition and Prediction: A Survey

Author: Fu Yun
Kong Yu
Publication venue
Publication date: 01/07/2018
Field of study

Derived from rapid advances in computer vision and machine learning, video analysis tasks have been moving from inferring the present state to predicting the future state. Vision-based action recognition and prediction from videos are such tasks, where action recognition is to infer human actions (present state) based upon complete action executions, and action prediction to predict human actions (future state) based upon incomplete action executions. These two tasks have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as visual surveillance, autonomous driving vehicle, entertainment, and video retrieval, etc. Many attempts have been devoted in the last a few decades in order to build a robust and effective framework for action recognition and prediction. In this paper, we survey the complete state-of-the-art techniques in the action recognition and prediction. Existing models, popular algorithms, technical difficulties, popular action databases, evaluation protocols, and promising future directions are also provided with systematic discussions

arXiv.org e-Print Archive

Volumetric Super-Resolution of Multispectral Data

Author: Aydin Vildan Atalay
Foroosh Hassan
Publication venue
Publication date: 13/05/2017
Field of study

Most multispectral remote sensors (e.g. QuickBird, IKONOS, and Landsat 7 ETM+) provide low-spatial high-spectral resolution multispectral (MS) or high-spatial low-spectral resolution panchromatic (PAN) images, separately. In order to reconstruct a high-spatial/high-spectral resolution multispectral image volume, either the information in MS and PAN images are fused (i.e. pansharpening) or super-resolution reconstruction (SRR) is used with only MS images captured on different dates. Existing methods do not utilize temporal information of MS and high spatial resolution of PAN images together to improve the resolution. In this paper, we propose a multiframe SRR algorithm using pansharpened MS images, taking advantage of both temporal and spatial information available in multispectral imagery, in order to exceed spatial resolution of given PAN images. We first apply pansharpening to a set of multispectral images and their corresponding PAN images captured on different dates. Then, we use the pansharpened multispectral images as input to the proposed wavelet-based multiframe SRR method to yield full volumetric SRR. The proposed SRR method is obtained by deriving the subband relations between multitemporal MS volumes. We demonstrate the results on Landsat 7 ETM+ images comparing our method to conventional techniques.Comment: arXiv admin note: text overlap with arXiv:1705.0125

arXiv.org e-Print Archive

View-Invariant Recognition of Action Style Self-Dissimilarity

Author: Foroosh Hassan
Shen Yuping
Publication venue
Publication date: 22/05/2017
Field of study

Self-similarity was recently introduced as a measure of inter-class congruence for classification of actions. Herein, we investigate the dual problem of intra-class dissimilarity for classification of action styles. We introduce self-dissimilarity matrices that discriminate between same actions performed by different subjects regardless of viewing direction and camera parameters. We investigate two frameworks using these invariant style dissimilarity measures based on Principal Component Analysis (PCA) and Fisher Discriminant Analysis (FDA). Extensive experiments performed on IXMAS dataset indicate remarkably good discriminant characteristics for the proposed invariant measures for gender recognition from video data

arXiv.org e-Print Archive

Non-Linear Phase-Shifting of Haar Wavelets for Run-Time All-Frequency Lighting

Author: Alnasser Mais
Foroosh Hassan
Publication venue
Publication date: 20/05/2017
Field of study

This paper focuses on real-time all-frequency image-based rendering using an innovative solution for run-time computation of light transport. The approach is based on new results derived for non-linear phase shifting in the Haar wavelet domain. Although image-based methods for real-time rendering of dynamic glossy objects have been proposed, they do not truly scale to all possible frequencies and high sampling rates without trading storage, glossiness, or computational time, while varying both lighting and viewpoint. This is due to the fact that current approaches are limited to precomputed radiance transfer (PRT), which is prohibitively expensive in terms of memory requirements and real-time rendering when both varying light and viewpoint changes are required together with high sampling rates for high frequency lighting of glossy material. On the other hand, current methods cannot handle object rotation, which is one of the paramount issues for all PRT methods using wavelets. This latter problem arises because the precomputed data are defined in a global coordinate system and encoded in the wavelet domain, while the object is rotated in a local coordinate system. At the root of all the above problems is the lack of efficient run-time solution to the nontrivial problem of rotating wavelets (a non-linear phase-shift), which we solve in this paper

arXiv.org e-Print Archive

A Comprehensive Survey of Deep Learning for Image Captioning

Author: Hossain Md. Zakir
Laga Hamid
Shiratuddin Mohd Fairuz
Sohel Ferdous
Publication venue
Publication date: 14/10/2018
Field of study

Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to present a comprehensive review of existing deep learning-based image captioning techniques. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning.Comment: 36 Pages, Accepted as a Journal Paper in ACM Computing Surveys (October 2018

arXiv.org e-Print Archive

Visual Affordance and Function Understanding: A Survey

Author: Hassanin Mohammed
Khan Salman
Tahtali Murat
Publication venue
Publication date: 18/07/2018
Field of study

Nowadays, robots are dominating the manufacturing, entertainment and healthcare industries. Robot vision aims to equip robots with the ability to discover information, understand it and interact with the environment. These capabilities require an agent to effectively understand object affordances and functionalities in complex visual domains. In this literature survey, we first focus on Visual affordances and summarize the state of the art as well as open problems and research gaps. Specifically, we discuss sub-problems such as affordance detection, categorization, segmentation and high-level reasoning. Furthermore, we cover functional scene understanding and the prevalent functional descriptors used in the literature. The survey also provides necessary background to the problem, sheds light on its significance and highlights the existing challenges for affordance and functionality learning.Comment: 26 pages, 22 image

arXiv.org e-Print Archive

Super-Resolution via Deep Learning

Author: Hayat Khizar
Publication venue: 'Elsevier BV'
Publication date: 27/06/2017
Field of study

The recent phenomenal interest in convolutional neural networks (CNNs) must have made it inevitable for the super-resolution (SR) community to explore its potential. The response has been immense and in the last three years, since the advent of the pioneering work, there appeared too many works not to warrant a comprehensive survey. This paper surveys the SR literature in the context of deep learning. We focus on the three important aspects of multimedia - namely image, video and multi-dimensions, especially depth maps. In each case, first relevant benchmarks are introduced in the form of datasets and state of the art SR methods, excluding deep learning. Next is a detailed analysis of the individual works, each including a short description of the method and a critique of the results with special reference to the benchmarking done. This is followed by minimum overall benchmarking in the form of comparison on some common dataset, while relying on the results reported in various works

arXiv.org e-Print Archive

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

Author: Cai Jianfei
Lu Shijian
Meng Fanman
Zhu Hongyuan
Publication venue
Publication date: 02/02/2015
Field of study

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision. A lot of research has been conducted and has resulted in many applications. However, while many segmentation algorithms exist, yet there are only a few sparse and outdated summarizations available, an overview of the recent achievements and issues is lacking. We aim to provide a comprehensive review of the recent progress in this field. Covering 180 publications, we give an overview of broad areas of segmentation topics including not only the classic bottom-up approaches, but also the recent development in superpixel, interactive methods, object proposals, semantic image parsing and image cosegmentation. In addition, we also review the existing influential datasets and evaluation metrics. Finally, we suggest some design flavors and research directions for future research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image Representatio

arXiv.org e-Print Archive

Causes of discomfort in stereoscopic content: a review

Author: Hansard Miles
Terzic Kasim
Publication venue
Publication date: 14/03/2017
Field of study

This paper reviews the causes of discomfort in viewing stereoscopic content. These include objective factors, such as misaligned images, as well as subjective factors, such as excessive disparity. Different approaches to the measurement of visual discomfort are also reviewed, in relation to the underlying physiological and psychophysical processes. The importance of understanding these issues, in the context of new display technologies, is emphasized

arXiv.org e-Print Archive