79,876 research outputs found
High-Resolution Mammogram Synthesis using Progressive Generative Adversarial Networks
The ability to generate synthetic medical images is useful for data
augmentation, domain transfer, and out-of-distribution detection. However,
generating realistic, high-resolution medical images is challenging,
particularly for Full Field Digital Mammograms (FFDM), due to the textural
heterogeneity, fine structural details and specific tissue properties. In this
paper, we explore the use of progressively trained generative adversarial
networks (GANs) to synthesize mammograms, overcoming the underlying
instabilities when training such adversarial models. This work is the first to
show that generation of realistic synthetic medical images is feasible at up to
1280x1024 pixels, the highest resolution achieved for medical image synthesis,
enabling visualizations within standard mammographic hanging protocols. We hope
this work can serve as a useful guide and facilitate further research on GANs
in the medical imaging domain
Smart, Sparse Contours to Represent and Edit Images
We study the problem of reconstructing an image from information stored at
contour locations. We show that high-quality reconstructions with high fidelity
to the source image can be obtained from sparse input, e.g., comprising less
than of image pixels. This is a significant improvement over existing
contour-based reconstruction methods that require much denser input to capture
subtle texture information and to ensure image quality. Our model, based on
generative adversarial networks, synthesizes texture and details in regions
where no input information is provided. The semantic knowledge encoded into our
model and the sparsity of the input allows to use contours as an intuitive
interface for semantically-aware image manipulation: local edits in contour
domain translate to long-range and coherent changes in pixel space. We can
perform complex structural changes such as changing facial expression by simple
edits of contours. Our experiments demonstrate that humans as well as a face
recognition system mostly cannot distinguish between our reconstructions and
the source images.Comment: Accepted to CVPR'18; Project page: contour2im.github.i
VR-Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control
In this paper, we deal with the reality gap from a novel perspective,
targeting transferring Deep Reinforcement Learning (DRL) policies learned in
simulated environments to the real-world domain for visual control tasks.
Instead of adopting the common solutions to the problem by increasing the
visual fidelity of synthetic images output from simulators during the training
phase, we seek to tackle the problem by translating the real-world image
streams back to the synthetic domain during the deployment phase, to make the
robot feel at home. We propose this as a lightweight, flexible, and efficient
solution for visual control, as 1) no extra transfer steps are required during
the expensive training of DRL agents in simulation; 2) the trained DRL agents
will not be constrained to being deployable in only one specific real-world
environment; 3) the policy training and the transfer operations are decoupled,
and can be conducted in parallel. Besides this, we propose a simple yet
effective shift loss that is agnostic to the downstream task, to constrain the
consistency between subsequent frames which is important for consistent policy
outputs. We validate the shift loss for artistic style transfer for videos and
domain adaptation, and validate our visual control approach in indoor and
outdoor robotics experiments.Comment: IEEE RA-L 2019 to appear. The first two authors contributed equally.
Video and supplement file are available on the project
page(https://goo.gl/KcvmRm
ACE: Adapting to Changing Environments for Semantic Segmentation
Deep neural networks exhibit exceptional accuracy when they are trained and
tested on the same data distributions. However, neural classifiers are often
extremely brittle when confronted with domain shift---changes in the input
distribution that occur over time. We present ACE, a framework for semantic
segmentation that dynamically adapts to changing environments over the time. By
aligning the distribution of labeled training data from the original source
domain with the distribution of incoming data in a shifted domain, ACE
synthesizes labeled training data for environments as it sees them. This
stylized data is then used to update a segmentation model so that it performs
well in new environments. To avoid forgetting knowledge from past environments,
we introduce a memory that stores feature statistics from previously seen
domains. These statistics can be used to replay images in any of the previously
observed domains, thus preventing catastrophic forgetting. In addition to
standard batch training using stochastic gradient decent (SGD), we also
experiment with fast adaptation methods based on adaptive meta-learning.
Extensive experiments are conducted on two datasets from SYNTHIA, the results
demonstrate the effectiveness of the proposed approach when adapting to a
number of tasks
Applying Visual Domain Style Transfer and Texture Synthesis Techniques to Audio - Insights and Challenges
Style transfer is a technique for combining two images based on the
activations and feature statistics in a deep learning neural network
architecture. This paper studies the analogous task in the audio domain and
takes a critical look at the problems that arise when adapting the original
vision-based framework to handle spectrogram representations. We conclude that
CNN architectures with features based on 2D representations and convolutions
are better suited for visual images than for time-frequency representations of
audio. Despite the awkward fit, experiments show that the Gram matrix
determined "style" for audio is more closely aligned with timbral signatures
without temporal structure whereas network layer activity determining audio
"content" seems to capture more of the pitch and rhythmic structures. We shed
insight on several reasons for the domain differences with illustrative
examples. We motivate the use of several types of one-dimensional CNNs that
generate results that are better aligned with intuitive notions of audio
texture than those based on existing architectures built for images. These
ideas also prompt an exploration of audio texture synthesis with architectural
variants for extensions to infinite textures, multi-textures, parametric
control of receptive fields and the constant-Q transform as an alternative
frequency scaling for the spectrogram.Comment: Post-peer-review, pre-copyedit version of an article to be published
in Neural Computing and Applications. 11 figure
A Robust Approach for Securing Audio Classification Against Adversarial Attacks
Adversarial audio attacks can be considered as a small perturbation
unperceptive to human ears that is intentionally added to the audio signal and
causes a machine learning model to make mistakes. This poses a security concern
about the safety of machine learning models since the adversarial attacks can
fool such models toward the wrong predictions. In this paper we first review
some strong adversarial attacks that may affect both audio signals and their 2D
representations and evaluate the resiliency of the most common machine learning
model, namely deep learning models and support vector machines (SVM) trained on
2D audio representations such as short time Fourier transform (STFT), discrete
wavelet transform (DWT) and cross recurrent plot (CRP) against several
state-of-the-art adversarial attacks. Next, we propose a novel approach based
on pre-processed DWT representation of audio signals and SVM to secure audio
systems against adversarial attacks. The proposed architecture has several
preprocessing modules for generating and enhancing spectrograms including
dimension reduction and smoothing. We extract features from small patches of
the spectrograms using speeded up robust feature (SURF) algorithm which are
further used to generate a codebook using the K-Means++ algorithm. Finally,
codewords are used to train a SVM on the codebook of the SURF-generated
vectors. All these steps yield to a novel approach for audio classification
that provides a good trade-off between accuracy and resilience. Experimental
results on three environmental sound datasets show the competitive performance
of proposed approach compared to the deep neural networks both in terms of
accuracy and robustness against strong adversarial attacks.Comment: Paper Accepted for Publication in IEEE Transactions on Information
Forensics and Securit
ShapeAdv: Generating Shape-Aware Adversarial 3D Point Clouds
We introduce ShapeAdv, a novel framework to study shape-aware adversarial
perturbations that reflect the underlying shape variations (e.g., geometric
deformations and structural differences) in the 3D point cloud space. We
develop shape-aware adversarial 3D point cloud attacks by leveraging the
learned latent space of a point cloud auto-encoder where the adversarial noise
is applied in the latent space. Specifically, we propose three different
variants including an exemplar-based one by guiding the shape deformation with
auxiliary data, such that the generated point cloud resembles the shape
morphing between objects in the same category. Different from prior works, the
resulting adversarial 3D point clouds reflect the shape variations in the 3D
point cloud space while still being close to the original one. In addition,
experimental evaluations on the ModelNet40 benchmark demonstrate that our
adversaries are more difficult to defend with existing point cloud defense
methods and exhibit a higher attack transferability across classifiers. Our
shape-aware adversarial attacks are orthogonal to existing point cloud based
attacks and shed light on the vulnerability of 3D deep neural networks.Comment: 3D Point Clouds, Adversarial Learnin
Cross-Resolution Person Re-identification with Deep Antithetical Learning
Images with different resolutions are ubiquitous in public person
re-identification (ReID) datasets and real-world scenes, it is thus crucial for
a person ReID model to handle the image resolution variations for improving its
generalization ability. However, most existing person ReID methods pay little
attention to this resolution discrepancy problem. One paradigm to deal with
this problem is to use some complicated methods for mapping all images into an
artificial image space, which however will disrupt the natural image
distribution and requires heavy image preprocessing. In this paper, we analyze
the deficiencies of several widely-used objective functions handling image
resolution discrepancies and propose a new framework called deep antithetical
learning that directly learns from the natural image space rather than creating
an arbitrary one. We first quantify and categorize original training images
according to their resolutions. Then we create an antithetical training set and
make sure that original training images have counterparts with antithetical
resolutions in this new set. At last, a novel Contrastive Center Loss(CCL) is
proposed to learn from images with different resolutions without being
interfered by their resolution discrepancies. Extensive experimental analyses
and evaluations indicate that the proposed framework, even using a vanilla deep
ReID network, exhibits remarkable performance improvements. Without bells and
whistles, our approach outperforms previous state-of-the-art methods by a large
margin
Dynamic-Net: Tuning the Objective Without Re-training for Synthesis Tasks
One of the key ingredients for successful optimization of modern CNNs is
identifying a suitable objective. To date, the objective is fixed a-priori at
training time, and any variation to it requires re-training a new network. In
this paper we present a first attempt at alleviating the need for re-training.
Rather than fixing the network at training time, we train a "Dynamic-Net" that
can be modified at inference time. Our approach considers an "objective-space"
as the space of all linear combinations of two objectives, and the Dynamic-Net
is emulating the traversing of this objective-space at test-time, without any
further training. We show that this upgrades pre-trained networks by providing
an out-of-learning extension, while maintaining the performance quality. The
solution we propose is fast and allows a user to interactively modify the
network, in real-time, in order to obtain the result he/she desires. We show
the benefits of such an approach via several different applications.Comment: version updat
Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing
Recent advances in neural networks (NNs) exhibit unprecedented success at
transforming large, unstructured data streams into compact higher-level
semantic information for tasks such as handwriting recognition, image
classification, and speech recognition. Ideally, systems would employ
near-sensor computation to execute these tasks at sensor endpoints to maximize
data reduction and minimize data movement. However, near- sensor computing
presents its own set of challenges such as operating power constraints, energy
budgets, and communication bandwidth capacities. In this paper, we propose a
stochastic- binary hybrid design which splits the computation between the
stochastic and binary domains for near-sensor NN applications. In addition, our
design uses a new stochastic adder and multiplier that are significantly more
accurate than existing adders and multipliers. We also show that retraining the
binary portion of the NN computation can compensate for precision losses
introduced by shorter stochastic bit-streams, allowing faster run times at
minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary
design can achieve 9.8x energy efficiency savings, and application-level
accuracies within 0.05% compared to conventional all-binary designs.Comment: 6 pages, 3 figures, Design, Automata and Test in Europe (DATE) 201
- …