Search CORE

21,165 research outputs found

swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

Author: Gan Lin
Liu Changxi
Luan Zhongzhi
Qian Depei
Sun Rujun
Yang Guangwen
Yang Hailong
Publication venue
Publication date: 18/04/2019
Field of study

The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor

arXiv.org e-Print Archive

A study of topologies and protocols for fiber optic local area network

Author: Gerla M.
Rodrigues P.
Yeh C.
Publication venue
Publication date
Field of study

The emergence of new applications requiring high data traffic necessitates the development of high speed local area networks. Optical fiber is selected as the transmission medium due to its inherent advantages over other possible media and the dual optical bus architecture is shown to be the most suitable topology. Asynchronous access protocols, including token, random, hybrid random/token, and virtual token schemes, are developed and analyzed. Exact expressions for insertion delay and utilization at light and heavy load are derived, and intermediate load behavior is investigated by simulation. A new tokenless adaptive scheme whose control depends only on the detection of activity on the channel is shown to outperform round-robin schemes under uneven loads and multipacket traffic and to perform optimally at light load. An approximate solution to the queueing delay for an oscillating polling scheme under chaining is obtained and results are compared with simulation. Solutions to the problem of building systems with a large number of stations are presented, including maximization of the number of optical couplers, and the use of passive star/bus topologies, bridges and gateways

NASA Technical Reports Server

On-Chip Transparent Wire Pipelining (invited paper)

Author: Casu Mario Roberto
Macchiarulo Luca
Publication venue: IEEE Computer Society
Publication date: 01/01/2004
Field of study

Wire pipelining has been proposed as a viable mean to break the discrepancy between decreasing gate delays and increasing wire delays in deep-submicron technologies. Far from being a straightforwardly applicable technique, this methodology requires a number of design modifications in order to insert it seamlessly in the current design flow. In this paper we briefly survey the methods presented by other researchers in the field and then we thoroughly analyze the solutions we recently proposed, ranging from system-level wire pipelining to physical design aspects

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Real-time detection of grid bulk transfer traffic

Author: Paisley J.
Sventek J.S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

The current practice of physical science research has yielded a continuously growing demand for interconnection network bandwidth to support the sharing of large datasets. Academic research networks and internet service providers have provisioned their networks to handle this type of load, which generates prolonged, high-volume traffic between nodes on the network. Maintenance of QoS for all network users demands that the onset of these (Grid bulk) transfers be detected to enable them to be reengineered through resources specifically provisioned to handle this type of traffic. This paper describes a real-time detector that operates at full-line-rate on Gb/s links, operates at high connection rates, and can track the use of ephemeral or non-standard ports

Crossref

Enlighten

Spread spectrum communication link using surface wave devices

Author: Fugit B. B.
Hunsinger B. J.
Publication venue
Publication date
Field of study

A fast lock-up, 8-MHz bandwidth 8,000 bit per second data rate spread spectrum communication link breadboard is described that is implemented using surface wave devices as the primary signal generators and signal processing elements. It uses surface wave tapped delay lines in the transmitter to generate the signals and in the receiver to detect them. The breadboard provides a measured processing gain for Gaussian noise of 31.5 dB which is within one dB of the theoretical optimum. This development demonstrates that spread spectrum receivers implemented with surface wave devices have sensitivities and complexities comparable to those of serial correlation receivers, but synchronization search times which are two to three orders of magnitude smaller

NASA Technical Reports Server

Preliminary basic performance analysis of the Cedar multiprocessor memory system

Author: Gallivan K.
Jalby W.
Turner S.
Veidenbaum A.
Wijshoff H.
Publication venue
Publication date
Field of study

Some preliminary basic results on the performance of the Cedar multiprocessor memory system are presented. Empirical results are presented and used to calibrate a memory system simulator which is then used to discuss the scalability of the system

NASA Technical Reports Server

Evaluation of the Cedar memory system: Configuration of 16 by 16

Author: Gallivan K.
Jalby W.
Wijshoff H.
Publication venue
Publication date
Field of study

Some basic results on the performance of the Cedar multiprocessor system are presented. Empirical results on the 16 processor 16 memory bank system configuration, which show the behavior of the Cedar system under different modes of operation are presented

NASA Technical Reports Server