Search CORE

4,336 research outputs found

swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

Author: Gan Lin
Liu Changxi
Luan Zhongzhi
Qian Depei
Sun Rujun
Yang Guangwen
Yang Hailong
Publication venue
Publication date: 18/04/2019
Field of study

The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor

arXiv.org e-Print Archive

Recommended from our members

Calcineurin B-Like Proteins CBL4 and CBL10 Mediate Two Independent Salt Tolerance Pathways in Arabidopsis.

Author: Lan Wen-Zhi
Luan Sheng
Tang Ren-Jie
Xu Hai-Xia
Yang Yang
Zhang Chi
Zhao Fugeng
Publication venue: eScholarship, University of California
Publication date: 01/05/2019
Field of study

In Arabidopsis, the salt overly sensitive (SOS) pathway, consisting of calcineurin B-like protein 4 (CBL4/SOS3), CBL-interacting protein kinase 24 (CIPK24/SOS2) and SOS1, has been well defined as a crucial mechanism to control cellular ion homoeostasis by extruding Na+ to the extracellular space, thus conferring salt tolerance in plants. CBL10 also plays a critical role in salt tolerance possibly by the activation of Na+ compartmentation into the vacuole. However, the functional relationship of the SOS and CBL10-regulated processes remains unclear. Here, we analyzed the genetic interaction between CBL4 and CBL10 and found that the cbl4 cbl10 double mutant was dramatically more sensitive to salt as compared to the cbl4 and cbl10 single mutants, suggesting that CBL4 and CBL10 each directs a different salt-tolerance pathway. Furthermore, the cbl4 cbl10 and cipk24 cbl10 double mutants were more sensitive than the cipk24 single mutant, suggesting that CBL10 directs a process involving CIPK24 and other partners different from the SOS pathway. Although the cbl4 cbl10, cipk24 cbl10, and sos1 cbl10 double mutants showed comparable salt-sensitive phenotype to sos1 at the whole plant level, they all accumulated much lower Na+ as compared to sos1 under high salt conditions, suggesting that CBL10 regulates additional unknown transport processes that play distinct roles from the SOS1 in Na+ homeostasis

eScholarship - University of California

Stochastic given-time H∞ consensus over Markov jump networks with disturbance constraint

Author: Ding Zhengtao
Liu Fei
Luan Xiaoli
Min Yang
Publication venue
Publication date: 01/01/2016
Field of study

The University of Manchester - Institutional Repository

Interference management in underlay in-band D2D-enhanced cellular networks : (Invited Paper)

Author: Ding M
Luan TH
Mao G
Yang J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2019
Field of study

© 2018 IEEE. Recently, it has been standardized by the 3rd Generation Partnership Project (3GPP) [1] that device-to-device (D2D) communications should use uplink resources when coexisting with conventional cellular communications. With uplink resource sharing, both cellular and D2D links cause significant co-channel interference. In this paper, we consider a D2D mode selection criterion based on the maximum received signal strength (MRSS) for each user equipment (UE) to control the D2D-to-cellular interference. Specifically, a UE will operate in a cellular mode, if its received signal strength from the strongest base station (BS) is larger than a threshold β; otherwise, it will operate in a D2D mode. Furthermore, in our study, cellular UEs, D2D transmit UEs and D2D receiver UEs constitute the entire UE set, which is a more practical assumption than dropping more UEs for D2D reception only in existing works. The coverage probability and the area spectral efficiency (ASE) are derived for both the cellular network and the D2D one. Through our theoretical and numerical analyses, we quantify the performance gains brought by D2D communications and provide guidelines for selecting the parameters for network operations

Crossref

OPUS - University of Technology Sydney