55 research outputs found
Distributed training of deep neural networks with spark: The MareNostrum experience
Deployment of a distributed deep learning technology stack on a large parallel system is a very complex process, involving the integration and configuration of several layers of both, general-purpose and custom software. The details of such kind of deployments are rarely described in the literature. This paper presents the experiences observed during the deployment of a technology stack to enable deep learning workloads on MareNostrum, a petascale supercomputer. The components of a layered architecture, based on the usage of Apache Spark, are described and the performance and scalability of the resulting system is evaluated. This is followed by a discussion about the impact of different configurations including parallelism, storage and networking alternatives, and other aspects related to the execution of deep learning workloads on a traditional HPC setup. The derived conclusions should be useful to guide similarly complex deployments in the future.Peer ReviewedPostprint (author's final draft
Convergence Analysis of Opto-Electronic Oscillator based Coherent Ising Machines
Ising machines are purported to be better at solving large-scale
combinatorial optimisation problems better than conventional von Neumann
computers. However, these Ising machines are widely believed to be heuristics,
whose promise is observed empirically rather than obtained theoretically. We
bridge this gap by considering an opto-electronic oscillator based coherent
Ising machine, and providing the first analytical proof that under reasonable
assumptions, the OEO-CIM is not a heuristic approach. We find and prove bounds
on its performance in terms of the expected difference between the objective
value at the final iteration and the optimal one, and on the number of
iterations required by it. In the process, we emphasise on some of its
limitations such as the inability to handle asymmetric coupling between spins,
and the absence of external magnetic field applied on them (both of which are
necessary in many optimisation problems), along with some issues in its
convergence. We overcome these limitations by proposing suitable adjustments
and prove that the improved architecture is guaranteed to converge to the
optimum of the relaxed objective function
Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions
Stochastic gradient descent (SGD) is a popular and efficient method with wide
applications in training deep neural nets and other nonconvex models. While the
behavior of SGD is well understood in the convex learning setting, the existing
theoretical results for SGD applied to nonconvex objective functions are far
from mature. For example, existing results require to impose a nontrivial
assumption on the uniform boundedness of gradients for all iterates encountered
in the learning process, which is hard to verify in practical implementations.
In this paper, we establish a rigorous theoretical foundation for SGD in
nonconvex learning by showing that this boundedness assumption can be removed
without affecting convergence rates. In particular, we establish sufficient
conditions for almost sure convergence as well as optimal convergence rates for
SGD applied to both general nonconvex objective functions and
gradient-dominated objective functions. A linear convergence is further derived
in the case with zero variances.Comment: Accepted by IEEE Transactions on Neural Networks and Learning
Systems. DOI: 10.1109/TNNLS.2019.295221
- …