398 research outputs found

    The Potential of the Intel Xeon Phi for Supervised Deep Learning

    Full text link
    Supervised learning of Convolutional Neural Networks (CNNs), also known as supervised Deep Learning, is a computationally demanding process. To find the most suitable parameters of a network for a given application, numerous training sessions are required. Therefore, reducing the training time per session is essential to fully utilize CNNs in practice. While numerous research groups have addressed the training of CNNs using GPUs, so far not much attention has been paid to the Intel Xeon Phi coprocessor. In this paper we investigate empirically and theoretically the potential of the Intel Xeon Phi for supervised learning of CNNs. We design and implement a parallelization scheme named CHAOS that exploits both the thread- and SIMD-parallelism of the coprocessor. Our approach is evaluated on the Intel Xeon Phi 7120P using the MNIST dataset of handwritten digits for various thread counts and CNN architectures. Results show a 103.5x speed up when training our large network for 15 epochs using 244 threads, compared to one thread on the coprocessor. Moreover, we develop a performance model and use it to assess our implementation and answer what-if questions.Comment: The 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Aug. 24 - 26, 2015, New York, US

    BrainFrame: A node-level heterogeneous accelerator platform for neuron simulations

    Full text link
    Objective: The advent of High-Performance Computing (HPC) in recent years has led to its increasing use in brain study through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a single acceleration (or homogeneous) platform to effectively address the complete array of modeling requirements. Approach: In this paper we propose and build BrainFrame, a heterogeneous acceleration platform, incorporating three distinct acceleration technologies, a Dataflow Engine, a Xeon Phi and a GP-GPU. The PyNN framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different instances of a state-of-the-art neuron model, modeling the Inferior- Olivary Nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal- network dimensions but also different network-connectivity circumstances that can drastically change application workload characteristics. Main results: The synthetic approach of three HPC technologies demonstrated that BrainFrame is better able to cope with the modeling diversity encountered. Our performance analysis shows clearly that the model directly affect performance and all three technologies are required to cope with all the model use cases.Comment: 16 pages, 18 figures, 5 table
    corecore