55 research outputs found

    Distributed training of deep neural networks with spark: The MareNostrum experience

    Get PDF
    Deployment of a distributed deep learning technology stack on a large parallel system is a very complex process, involving the integration and configuration of several layers of both, general-purpose and custom software. The details of such kind of deployments are rarely described in the literature. This paper presents the experiences observed during the deployment of a technology stack to enable deep learning workloads on MareNostrum, a petascale supercomputer. The components of a layered architecture, based on the usage of Apache Spark, are described and the performance and scalability of the resulting system is evaluated. This is followed by a discussion about the impact of different configurations including parallelism, storage and networking alternatives, and other aspects related to the execution of deep learning workloads on a traditional HPC setup. The derived conclusions should be useful to guide similarly complex deployments in the future.Peer ReviewedPostprint (author's final draft

    Convergence Analysis of Opto-Electronic Oscillator based Coherent Ising Machines

    Full text link
    Ising machines are purported to be better at solving large-scale combinatorial optimisation problems better than conventional von Neumann computers. However, these Ising machines are widely believed to be heuristics, whose promise is observed empirically rather than obtained theoretically. We bridge this gap by considering an opto-electronic oscillator based coherent Ising machine, and providing the first analytical proof that under reasonable assumptions, the OEO-CIM is not a heuristic approach. We find and prove bounds on its performance in terms of the expected difference between the objective value at the final iteration and the optimal one, and on the number of iterations required by it. In the process, we emphasise on some of its limitations such as the inability to handle asymmetric coupling between spins, and the absence of external magnetic field applied on them (both of which are necessary in many optimisation problems), along with some issues in its convergence. We overcome these limitations by proposing suitable adjustments and prove that the improved architecture is guaranteed to converge to the optimum of the relaxed objective function

    Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

    Full text link
    Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this paper, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex objective functions and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.Comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2019.295221
    • …
    corecore