85,548 research outputs found
Efficient Per-Example Gradient Computations in Convolutional Neural Networks
Deep learning frameworks leverage GPUs to perform massively-parallel
computations over batches of many training examples efficiently. However, for
certain tasks, one may be interested in performing per-example computations,
for instance using per-example gradients to evaluate a quantity of interest
unique to each example. One notable application comes from the field of
differential privacy, where per-example gradients must be norm-bounded in order
to limit the impact of each example on the aggregated batch gradient. In this
work, we discuss how per-example gradients can be efficiently computed in
convolutional neural networks (CNNs). We compare existing strategies by
performing a few steps of differentially-private training on CNNs of varying
sizes. We also introduce a new strategy for per-example gradient calculation,
which is shown to be advantageous depending on the model architecture and how
the model is trained. This is a first step in making differentially-private
training of CNNs practical
LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning
Gradient-based distributed learning in Parameter Server (PS) computing
architectures is subject to random delays due to straggling worker nodes, as
well as to possible communication bottlenecks between PS and workers. Solutions
have been recently proposed to separately address these impairments based on
the ideas of gradient coding, worker grouping, and adaptive worker selection.
This paper provides a unified analysis of these techniques in terms of
wall-clock time, communication, and computation complexity measures.
Furthermore, in order to combine the benefits of gradient coding and grouping
in terms of robustness to stragglers with the communication and computation
load gains of adaptive selection, novel strategies, named Lazily Aggregated
Gradient Coding (LAGC) and Grouped-LAG (G-LAG), are introduced. Analysis and
results show that G-LAG provides the best wall-clock time and communication
performance, while maintaining a low computational cost, for two representative
distributions of the computing times of the worker nodes.Comment: Submitte
An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method
In this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in or external gradient-based optimization procedures. For speed, a model order reduction technique is used and the gradient computation is achieved by perturbation with geometry deformation, processed on the level of the individual mesh nodes. To maximize performance, the framework is targeted to multicore CPU architectures and its extended version can also use multiple GPUs. To illustrate the accuracy and high efficiency of the framework, we provide examples of simulations of a dielectric resonator antenna and full-wave design by optimization of two diplexers involving tens of unknowns, and show that the design can be completed within the duration of a few simulations using industry-standard FEM solvers. The accuracy of the design is confirmed by measurements
- …