Search CORE

17,831 research outputs found

A neural compiler

Author: Gruau Frédéric
Ratajszczak Jean-Yves
Wiber Gilles
Publication venue: Published by Elsevier B.V.
Publication date: 17/04/1995
Field of study

AbstractThis paper describes a neural compiler. The input of the compiler is a PASCAL Program. The compiler produces a neural network that computes what is specified by the PASCAL program. The compiler generates an intermediate code called cellular code

Elsevier - Publisher Connector

Efficiently modeling neural networks on massively parallel computers

Author: Farber Robert M.
Publication venue
Publication date
Field of study

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks

NASA Technical Reports Server

Theano: new features and speed improvements

Author: Bastien Frédéric
Bengio Yoshua
Bergeron Arnaud
Bergstra James
Bouchard Nicolas
Goodfellow Ian
Lamblin Pascal
Pascanu Razvan
Warde-Farley David
Publication venue
Publication date: 23/11/2012
Field of study

Theano is a linear algebra compiler that optimizes a user's symbolically-specified mathematical computations to produce efficient low-level implementations. In this paper, we present new features and efficiency improvements to Theano, and benchmarks demonstrating Theano's performance relative to Torch7, a recently introduced machine learning library, and to RNNLM, a C++ library targeted at recurrent neural networks.Comment: Presented at the Deep Learning Workshop, NIPS 201

arXiv.org e-Print Archive

CiteSeerX

swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

Author: Gan Lin
Liu Changxi
Luan Zhongzhi
Qian Depei
Sun Rujun
Yang Guangwen
Yang Hailong
Publication venue
Publication date: 18/04/2019
Field of study

The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor

arXiv.org e-Print Archive