19 research outputs found

    DIY Human Action Data Set Generation

    Full text link
    The recent successes in applying deep learning techniques to solve standard computer vision problems has aspired researchers to propose new computer vision problems in different domains. As previously established in the field, training data itself plays a significant role in the machine learning process, especially deep learning approaches which are data hungry. In order to solve each new problem and get a decent performance, a large amount of data needs to be captured which may in many cases pose logistical difficulties. Therefore, the ability to generate de novo data or expand an existing data set, however small, in order to satisfy data requirement of current networks may be invaluable. Herein, we introduce a novel way to partition an action video clip into action, subject and context. Each part is manipulated separately and reassembled with our proposed video generation technique. Furthermore, our novel human skeleton trajectory generation along with our proposed video generation technique, enables us to generate unlimited action recognition training data. These techniques enables us to generate video action clips from an small set without costly and time-consuming data acquisition. Lastly, we prove through extensive set of experiments on two small human action recognition data sets, that this new data generation technique can improve the performance of current action recognition neural nets

    Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images

    Full text link
    Modeling statistical regularity plays an essential role in ill-posed image processing problems. Recently, deep learning based methods have been presented to implicitly learn statistical representation of pixel distributions in natural images and leverage it as a constraint to facilitate subsequent tasks, such as color constancy and image dehazing. However, the existing CNN architecture is prone to variability and diversity of pixel intensity within and between local regions, which may result in inaccurate statistical representation. To address this problem, this paper presents a novel fully point-wise CNN architecture for modeling statistical regularities in natural images. Specifically, we propose to randomly shuffle the pixels in the origin images and leverage the shuffled image as input to make CNN more concerned with the statistical properties. Moreover, since the pixels in the shuffled image are independent identically distributed, we can replace all the large convolution kernels in CNN with point-wise (1∗11*1) convolution kernels while maintaining the representation ability. Experimental results on two applications: color constancy and image dehazing, demonstrate the superiority of our proposed network over the existing architectures, i.e., using 1/10∼\sim1/100 network parameters and computational cost while achieving comparable performance.Comment: 9 pages, 7 figures. To appear in ACM MM 201

    Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos

    Full text link
    We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads. Such video contents are very common in social media, teleconferencing, distance education, tele-medicine, etc., and often need to be transmitted with limited bandwidth. The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts caused by spatial down sampling and quantization. The deep learning approach turns out to be ideally suited for the video restoration task, as the complex non-linear cross-modality correlations are very difficult to model analytically and explicitly. The new method is a video post processor that can significantly boost the perceptual quality of aggressively compressed talking head videos, while being fully compatible with all existing video compression standards.Comment: Accepted by Proceedings of the 28th ACM International Conference on Multimedia(ACM MM),202

    Estimating the colour of the illuminant using specular reflection and exemplar-based method

    Get PDF
    In this thesis, we propose methods for estimation of the colour of the illuminant. First, we investigate the effects of bright pixels on several current colour constancy algorithms. Then we use bright pixels to extend the seminal Gamut Mapping Colour Constancy algorithm. Here we define the White-Patch Gamut as a new extension to this method, comprising the bright pixels of the image. This approach adds new constraints to the standard constraints and improved estimates. Motivated by the effect of bright pixels in illumination estimation, we go on to incorporate consideration of specular reflection per se, which tends to generate bright pixels. To this effect we present a new and effective physics-based colour constancy representation, called the Zeta-Image, which makes use of a novel log-relative-chromaticity planar constraint. This method is fast and requires no training or tunable parameters; moreover, and importantly, it can be useful for removing highlights. We then go on to present a new camera calibration method aimed at finding a straight-line locus, in a special colour feature space, that is traversed by daylights and approximately by specular points. The aim of the calibration is to enable recovering the colour of the illuminant. Finally, we address colour constancy in a novel approach by utilizing unsupervised learning of a model for each training surface in training images. We call this new method Exemplar-Based Colour Constancy. In this method, we find nearest-neighbour models for each test surface and estimate its illumination based on comparing the statistics of nearest-neighbour surfaces and the target surface. We also extend our method to overcome the multiple illuminant problem
    corecore