19 research outputs found
DIY Human Action Data Set Generation
The recent successes in applying deep learning techniques to solve standard
computer vision problems has aspired researchers to propose new computer vision
problems in different domains. As previously established in the field, training
data itself plays a significant role in the machine learning process,
especially deep learning approaches which are data hungry. In order to solve
each new problem and get a decent performance, a large amount of data needs to
be captured which may in many cases pose logistical difficulties. Therefore,
the ability to generate de novo data or expand an existing data set, however
small, in order to satisfy data requirement of current networks may be
invaluable. Herein, we introduce a novel way to partition an action video clip
into action, subject and context. Each part is manipulated separately and
reassembled with our proposed video generation technique. Furthermore, our
novel human skeleton trajectory generation along with our proposed video
generation technique, enables us to generate unlimited action recognition
training data. These techniques enables us to generate video action clips from
an small set without costly and time-consuming data acquisition. Lastly, we
prove through extensive set of experiments on two small human action
recognition data sets, that this new data generation technique can improve the
performance of current action recognition neural nets
Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images
Modeling statistical regularity plays an essential role in ill-posed image
processing problems. Recently, deep learning based methods have been presented
to implicitly learn statistical representation of pixel distributions in
natural images and leverage it as a constraint to facilitate subsequent tasks,
such as color constancy and image dehazing. However, the existing CNN
architecture is prone to variability and diversity of pixel intensity within
and between local regions, which may result in inaccurate statistical
representation. To address this problem, this paper presents a novel fully
point-wise CNN architecture for modeling statistical regularities in natural
images. Specifically, we propose to randomly shuffle the pixels in the origin
images and leverage the shuffled image as input to make CNN more concerned with
the statistical properties. Moreover, since the pixels in the shuffled image
are independent identically distributed, we can replace all the large
convolution kernels in CNN with point-wise () convolution kernels while
maintaining the representation ability. Experimental results on two
applications: color constancy and image dehazing, demonstrate the superiority
of our proposed network over the existing architectures, i.e., using
1/101/100 network parameters and computational cost while achieving
comparable performance.Comment: 9 pages, 7 figures. To appear in ACM MM 201
Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos
We propose a novel deep multi-modality neural network for restoring very low
bit rate videos of talking heads. Such video contents are very common in social
media, teleconferencing, distance education, tele-medicine, etc., and often
need to be transmitted with limited bandwidth. The proposed CNN method exploits
the correlations among three modalities, video, audio and emotion state of the
speaker, to remove the video compression artifacts caused by spatial down
sampling and quantization. The deep learning approach turns out to be ideally
suited for the video restoration task, as the complex non-linear cross-modality
correlations are very difficult to model analytically and explicitly. The new
method is a video post processor that can significantly boost the perceptual
quality of aggressively compressed talking head videos, while being fully
compatible with all existing video compression standards.Comment: Accepted by Proceedings of the 28th ACM International Conference on
Multimedia(ACM MM),202
Estimating the colour of the illuminant using specular reflection and exemplar-based method
In this thesis, we propose methods for estimation of the colour of the illuminant. First, we investigate the effects of bright pixels on several current colour constancy algorithms. Then we use bright pixels to extend the seminal Gamut Mapping Colour Constancy algorithm. Here we define the White-Patch Gamut as a new extension to this method, comprising the bright pixels of the image. This approach adds new constraints to the standard constraints and improved estimates. Motivated by the effect of bright pixels in illumination estimation, we go on to incorporate consideration of specular reflection per se, which tends to generate bright pixels. To this effect we present a new and effective physics-based colour constancy representation, called the Zeta-Image, which makes use of a novel log-relative-chromaticity planar constraint. This method is fast and requires no training or tunable parameters; moreover, and importantly, it can be useful for removing highlights. We then go on to present a new camera calibration method aimed at finding a straight-line locus, in a special colour feature space, that is traversed by daylights and approximately by specular points. The aim of the calibration is to enable recovering the colour of the illuminant. Finally, we address colour constancy in a novel approach by utilizing unsupervised learning of a model for each training surface in training images. We call this new method Exemplar-Based Colour Constancy. In this method, we find nearest-neighbour models for each test surface and estimate its illumination based on comparing the statistics of nearest-neighbour surfaces and the target surface. We also extend our method to overcome the multiple illuminant problem