6,138 research outputs found
Deformable GANs for Pose-based Human Image Generation
In this paper we address the problem of generating person images conditioned
on a given pose. Specifically, given an image of a person and a target pose, we
synthesize a new image of that person in the novel pose. In order to deal with
pixel-to-pixel misalignments caused by the pose differences, we introduce
deformable skip connections in the generator of our Generative Adversarial
Network. Moreover, a nearest-neighbour loss is proposed instead of the common
L1 and L2 losses in order to match the details of the generated image with the
target image. We test our approach using photos of persons in different poses
and we compare our method with previous work in this area showing
state-of-the-art results in two benchmarks. Our method can be applied to the
wider field of deformable object generation, provided that the pose of the
articulated object can be extracted using a keypoint detector.Comment: CVPR 2018 versio
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Adaptive Deep Learning through Visual Domain Localization
A commercial robot, trained by its manufacturer to recognize a predefined number and type of objects, might be used in many settings, that will in general differ in their illumination conditions, background, type and degree of clutter, and so on. Recent computer vision works tackle this generalization issue through domain adaptation methods, assuming as source the visual domain where the system is trained and as target the domain of deployment. All approaches assume to have access to images from all classes of the target during training, an unrealistic condition in robotics applications. We address this issue proposing an algorithm that takes into account the specific needs of robot vision. Our intuition is that the nature of the domain shift experienced mostly in robotics is local. We exploit this through the learning of maps that spatially ground the domain and quantify the degree of shift, embedded into an end-to-end deep domain adaptation architecture. By explicitly localizing the roots of the domain shift we significantly reduce the number of parameters of the architecture to tune, we gain the flexibility necessary to deal with subset of categories in the target domain at training time, and we provide a clear feedback on the rationale behind any classification decision, which can be exploited in human-robot interactions. Experiments on two different settings of the iCub World database confirm the suitability of our method for robot vision
Adversarial Attacks on Deep Neural Networks for Time Series Classification
Time Series Classification (TSC) problems are encountered in many real life
data mining tasks ranging from medicine and security to human activity
recognition and food safety. With the recent success of deep neural networks in
various domains such as computer vision and natural language processing,
researchers started adopting these techniques for solving time series data
mining problems. However, to the best of our knowledge, no previous work has
considered the vulnerability of deep learning models to adversarial time series
examples, which could potentially make them unreliable in situations where the
decision taken by the classifier is crucial such as in medicine and security.
For computer vision problems, such attacks have been shown to be very easy to
perform by altering the image and adding an imperceptible amount of noise to
trick the network into wrongly classifying the input image. Following this line
of work, we propose to leverage existing adversarial attack mechanisms to add a
special noise to the input time series in order to decrease the network's
confidence when classifying instances at test time. Our results reveal that
current state-of-the-art deep learning time series classifiers are vulnerable
to adversarial attacks which can have major consequences in multiple domains
such as food safety and quality assurance.Comment: Accepted at IJCNN 201
- …