62 research outputs found
Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms
The implementation of a vast majority of machine learning (ML) algorithms
boils down to solving a numerical optimization problem. In this context,
Stochastic Gradient Descent (SGD) methods have long proven to provide good
results, both in terms of convergence and accuracy. Recently, several
parallelization approaches have been proposed in order to scale SGD to solve
very large ML problems. At their core, most of these approaches are following a
map-reduce scheme. This paper presents a novel parallel updating algorithm for
SGD, which utilizes the asynchronous single-sided communication paradigm.
Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent
(ASGD) provides faster (or at least equal) convergence, close to linear scaling
and stable accuracy
Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms
Stochastic Gradient Descent (SGD) is the standard numerical method used to
solve the core optimization problem for the vast majority of machine learning
(ML) algorithms. In the context of large scale learning, as utilized by many
Big Data applications, efficient parallelization of SGD is in the focus of
active research. Recently, we were able to show that the asynchronous
communication paradigm can be applied to achieve a fast and scalable
parallelization of SGD. Asynchronous Stochastic Gradient Descent (ASGD)
outperforms other, mostly MapReduce based, parallel algorithms solving large
scale machine learning problems. In this paper, we investigate the impact of
asynchronous communication frequency and message size on the performance of
ASGD applied to large scale ML on HTC cluster and cloud environments. We
introduce a novel algorithm for the automatic balancing of the asynchronous
communication load, which allows to adapt ASGD to changing network bandwidths
and latencies.Comment: arXiv admin note: substantial text overlap with arXiv:1505.0495
Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training
Deep Neural Network (DNN) are currently of great inter- est in research and
application. The training of these net- works is a compute intensive and time
consuming task. To reduce training times to a bearable amount at reasonable
cost we extend the popular Caffe toolbox for DNN with an efficient distributed
memory communication pattern. To achieve good scalability we emphasize the
overlap of computation and communication and prefer fine granu- lar
synchronization patterns over global barriers. To im- plement these
communication patterns we rely on the the Global address space Programming
Interface version 2 (GPI-2) communication library. This interface provides a
light-weight set of asynchronous one-sided communica- tion primitives
supplemented by non-blocking fine gran- ular data synchronization mechanisms.
Therefore, Caf- feGPI is the name of our parallel version of Caffe. First
benchmarks demonstrate better scaling behavior com- pared with other
extensions, e.g., the Intel TM Caffe. Even within a single symmetric
multiprocessing machine with four graphics processing units, the CaffeGPI
scales bet- ter than the standard Caffe toolbox. These first results
demonstrate that the use of standard High Performance Computing (HPC) hardware
is a valid cost saving ap- proach to train large DDNs. I/O is an other
bottleneck to work with DDNs in a standard parallel HPC setting, which we will
consider in more detail in a forthcoming paper
Local Facial Attribute Transfer through Inpainting
The term attribute transfer refers to the tasks of altering images in such a
way, that the semantic interpretation of a given input image is shifted towards
an intended direction, which is quantified by semantic attributes. Prominent
example applications are photo realistic changes of facial features and
expressions, like changing the hair color, adding a smile, enlarging the nose
or altering the entire context of a scene, like transforming a summer landscape
into a winter panorama. Recent advances in attribute transfer are mostly based
on generative deep neural networks, using various techniques to manipulate
images in the latent space of the generator. In this paper, we present a novel
method for the common sub-task of local attribute transfers, where only parts
of a face have to be altered in order to achieve semantic changes (e.g.
removing a mustache). In contrast to previous methods, where such local changes
have been implemented by generating new (global) images, we propose to
formulate local attribute transfers as an inpainting problem. Removing and
regenerating only parts of images, our Attribute Transfer Inpainting Generative
Adversarial Network (ATI-GAN) is able to utilize local context information,
resulting in visually sound results
TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization
Recent research implies that training and inference of deep neural networks
(DNN) can be computed with low precision numerical representations of the
training/test data, weights and gradients without a general loss in accuracy.
The benefit of such compact representations is twofold: they allow a
significant reduction of the communication bottleneck in distributed DNN
training and faster neural network implementations on hardware accelerators
like FPGAs. Several quantization methods have been proposed to map the original
32-bit floating point problem to low-bit representations. While most related
publications validate the proposed approach on a single DNN topology, it
appears to be evident, that the optimal choice of the quantization method and
number of coding bits is topology dependent. To this end, there is no general
theory available, which would allow users to derive the optimal quantization
during the design of a DNN topology. In this paper, we present a quantization
tool box for the TensorFlow framework. TensorQuant allows a transparent
quantization simulation of existing DNN topologies during training and
inference. TensorQuant supports generic quantization methods and allows
experimental evaluation of the impact of the quantization on single layers as
well as on the full topology. In a first series of experiments with
TensorQuant, we show an analysis of fix-point quantizations of popular CNN
topologies
Unbalanced tree search on a manycore system using the GPI programming model
The recent developments in computer architectures progress towards systems with large core count (Manycore) which expose more parallelism to applications. Some applications named irregular and unbalanced applications demand a dynamic and asynchronous load balance implementation to utilize the full performance a Manycore system. For example, the recently established Graph500 benchmark aims at such applications. The UTS benchmark characterizes the performance of such irregular and unbalanced computations with a tree-structured search space that requires continuous dynamic load balancing. GPI is a PGAS API that delivers the full performance of RDMA-enabled networks directly to the application. Its programming model focuses the use of one-sided asynchronous communication, overlapping computation and communication. In this paper we address the dynamic load balancing requirements of unbalanced applications using the GPI programming model. Using the UTS benchmark, we detail the implementation of a work stealing algorithm using GPI and present the performance results. Our performance evaluation shows significant improvements when compared with the optimized MPI version with a maximum performance of 9.5 billion nodes per second on 3072 cores
SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain
Despite the success of convolutional neural networks (CNNs) in many computer
vision and image analysis tasks, they remain vulnerable against so-called
adversarial attacks: Small, crafted perturbations in the input images can lead
to false predictions. A possible defense is to detect adversarial examples. In
this work, we show how analysis in the Fourier domain of input images and
feature maps can be used to distinguish benign test samples from adversarial
images. We propose two novel detection methods: Our first method employs the
magnitude spectrum of the input images to detect an adversarial attack. This
simple and robust classifier can successfully detect adversarial perturbations
of three commonly used attack methods. The second method builds upon the first
and additionally extracts the phase of Fourier coefficients of feature-maps at
different layers of the network. With this extension, we are able to improve
adversarial detection rates compared to state-of-the-art detectors on five
different attack methods
Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets
The widespread adoption of generative image models has highlighted the urgent
need to detect artificial content, which is a crucial step in combating
widespread manipulation and misinformation. Consequently, numerous detectors
and associated datasets have emerged. However, many of these datasets
inadvertently introduce undesirable biases, thereby impacting the effectiveness
and evaluation of detectors. In this paper, we emphasize that many datasets for
AI-generated image detection contain biases related to JPEG compression and
image size. Using the GenImage dataset, we demonstrate that detectors indeed
learn from these undesired factors. Furthermore, we show that removing the
named biases substantially increases robustness to JPEG compression and
significantly alters the cross-generator performance of evaluated detectors.
Specifically, it leads to more than 11 percentage points increase in
cross-generator performance for ResNet50 and Swin-T detectors on the GenImage
dataset, achieving state-of-the-art results.
We provide the dataset and source codes of this paper on the anonymous
website: https://www.unbiased-genimage.or
- …