678 research outputs found

    Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

    Full text link
    The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in terms of convergence and accuracy. Recently, several parallelization approaches have been proposed in order to scale SGD to solve very large ML problems. At their core, most of these approaches are following a map-reduce scheme. This paper presents a novel parallel updating algorithm for SGD, which utilizes the asynchronous single-sided communication paradigm. Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent (ASGD) provides faster (or at least equal) convergence, close to linear scaling and stable accuracy

    Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

    Full text link
    Deep Neural Network (DNN) are currently of great inter- est in research and application. The training of these net- works is a compute intensive and time consuming task. To reduce training times to a bearable amount at reasonable cost we extend the popular Caffe toolbox for DNN with an efficient distributed memory communication pattern. To achieve good scalability we emphasize the overlap of computation and communication and prefer fine granu- lar synchronization patterns over global barriers. To im- plement these communication patterns we rely on the the Global address space Programming Interface version 2 (GPI-2) communication library. This interface provides a light-weight set of asynchronous one-sided communica- tion primitives supplemented by non-blocking fine gran- ular data synchronization mechanisms. Therefore, Caf- feGPI is the name of our parallel version of Caffe. First benchmarks demonstrate better scaling behavior com- pared with other extensions, e.g., the Intel TM Caffe. Even within a single symmetric multiprocessing machine with four graphics processing units, the CaffeGPI scales bet- ter than the standard Caffe toolbox. These first results demonstrate that the use of standard High Performance Computing (HPC) hardware is a valid cost saving ap- proach to train large DDNs. I/O is an other bottleneck to work with DDNs in a standard parallel HPC setting, which we will consider in more detail in a forthcoming paper

    Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms

    Full text link
    Stochastic Gradient Descent (SGD) is the standard numerical method used to solve the core optimization problem for the vast majority of machine learning (ML) algorithms. In the context of large scale learning, as utilized by many Big Data applications, efficient parallelization of SGD is in the focus of active research. Recently, we were able to show that the asynchronous communication paradigm can be applied to achieve a fast and scalable parallelization of SGD. Asynchronous Stochastic Gradient Descent (ASGD) outperforms other, mostly MapReduce based, parallel algorithms solving large scale machine learning problems. In this paper, we investigate the impact of asynchronous communication frequency and message size on the performance of ASGD applied to large scale ML on HTC cluster and cloud environments. We introduce a novel algorithm for the automatic balancing of the asynchronous communication load, which allows to adapt ASGD to changing network bandwidths and latencies.Comment: arXiv admin note: substantial text overlap with arXiv:1505.0495

    Personal and Political: A Micro-history of the “Red Column” Collective Farm, 1935-36

    Get PDF
    This article investigates the confluence of personal interests and official policy on collective farms in the mid-1930s, a period that has received far less scholarly attention than the collectivization drive. The current historiography on collective farmers’ relationship with the state is one-sided, presenting peasants either as passive victims of or idealized resistors to state policies. Both views minimize the complex realities that governed the everyday lives of collective farmers for whom state policies often were secondary to local concerns. This paper, which draws upon rich archival materials in Kirov Krai, employs a micro-historical approach to study the struggle to remove the chairman of the “Red Column” collective farm in Kirov Krai in 1935- 36. It demonstrates that local and personal issues (family ties, grudges, and personality traits) had more influence on how collective farmers reacted to state campaigns and investigations than did official state policy and rhetoric. The chairman’s rude and arrogant behavior, mistreatment of the collective farmers, and flaunting of material goods led to his downfall. But to strengthen their arguments, his opponents accused him of associating with kulaks and white guardists. The chairman and his supporters struck back, alleging that his detractors were themselves white guardists and kulaks, who sought revenge for having been expelled from the collective farm. Such a micro-historical approach reveals the importance of popular opinion, attitudes, and behavior on collective farms and the level of control that collective farmers had over shaping the implementation of state policies. This paper enables one to appreciate that peasants knew well how to manipulate official labels, such as kulak or class enemy, as weapons to achieve goals of local and personal importance. It enriches the historiography by offering a different way to appreciate peasant attitudes and behavior, and collective farm life in the mid-1930s

    Optimization of Computationally and I/O Intense Patterns in Electronic Structure and Machine Learning Algorithms

    Get PDF
    Development of scalable High-Performance Computing (HPC) applications is already a challenging task even in the pre-Exascale era. Utilization of the full potential of (near-)future supercomputers will most likely require the mastery of massively parallel heterogeneous architectures with multi-tier persistence systems, ideally in fault tolerant mode. With the change in hardware architectures HPC applications are also widening their scope to `Big data' processing and analytics using machine learning algorithms and neural networks. In this work, in cooperation with the INTERTWinE FET-HPC project, we demonstrate how the GASPI (Global Address Space Programming Interface) programming model helps to address these Exascale challenges on examples of tensor contraction, K-means and Terasort algorithms

    A Theory of Partitioned Global Address Spaces

    Get PDF
    Partitioned global address space (PGAS) is a parallel programming model for the development of applications on clusters. It provides a global address space partitioned among the cluster nodes, and is supported in programming languages like C, C++, and Fortran by means of APIs. In this paper we provide a formal model for the semantics of single instruction, multiple data programs using PGAS APIs. Our model reflects the main features of popular real-world APIs such as SHMEM, ARMCI, GASNet, GPI, and GASPI. A key feature of PGAS is the support for one-sided communication: a node may directly read and write the memory located at a remote node, without explicit synchronization with the processes running on the remote side. One-sided communication increases performance by decoupling process synchronization from data transfer, but requires the programmer to reason about appropriate synchronizations between reads and writes. As a second contribution, we propose and investigate robustness, a criterion for correct synchronization of PGAS programs. Robustness corresponds to acyclicity of a suitable happens-before relation defined on PGAS computations. The requirement is finer than the classical data race freedom and rules out most false error reports. Our main result is an algorithm for checking robustness of PGAS programs. The algorithm makes use of two insights. Using combinatorial arguments we first show that, if a PGAS program is not robust, then there are computations in a certain normal form that violate happens-before acyclicity. Intuitively, normal-form computations delay remote accesses in an ordered way. We then devise an algorithm that checks for cyclic normal-form computations. Essentially, the algorithm is an emptiness check for a novel automaton model that accepts normal-form computations in streaming fashion. Altogether, we prove the robustness problem is PSpace-complete

    Stalin’s Constitution

    Get PDF
    Upon its adoption in December 1936, Soviet leaders hailed the new so-called Stalin Constitution as the most democratic in the world. Scholars have long scoffed at this claim, noting that the mass repression of 1937-1938 that followed rendered it a hollow document. This book focuses on the six-month long popular discussion of the draft Constitution, which preceded its formal adoption in December 1936. Drawing on rich archival sources, this book uses the discussion of the draft 1936 Constitution to examine discourse between the central state leadership and citizens about the new Soviet social contract, which delineated the roles the state and citizens should play in developing socialism
    • …
    corecore