Search CORE

85,363 research outputs found

Optimizing GPU Memory Transactions for Convolution Operations

Author: Lu G
Wang Z
Zhang W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/09/2020
Field of study

Convolution computation is a common operation in deep neural networks (DNNs) and is often responsible for performance bottlenecks during training and inferencing. Existing approaches for accelerating convolution operations aim to reduce computational complexity. However, these strategies often increase the memory footprint with extra memory accesses, thereby leaving much room for performance improvement. This paper presents a novel approach to optimize memory access for convolution operations, specifically targeting GPU execution. Our approach leverages two optimization techniques to reduce the number of memory operations for convolution operations performed on the width and height dimensions. For convolution computations on the width dimension, we exploit shuffle instructions to exchange the overlapped columns of the input for reducing the number of memory transactions. For convolution operations on the height dimension, we multiply each overlapped row of the input with multiple rows of a filter to compute multiple output elements to improve the data locality of row elements. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPU. For 2D convolution, our approach delivers over faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to speedups over the quickest algorithm of cuDNN. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPU. For 2D convolution, our approach delivers over 2× faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to 1.3× speedups over the quickest algorithm of cuDNN

Crossref

White Rose Research Online

MOF-BC: A Memory Optimized and Flexible BlockChain for Large Scale Networks

Author: Dorri Ali
Jurdak Raja
Kanhere Salil S.
Publication venue
Publication date: 13/01/2018
Field of study

BlockChain (BC) immutability ensures BC resilience against modification or removal of the stored data. In large scale networks like the Internet of Things (IoT), however, this feature significantly increases BC storage size and raises privacy challenges. In this paper, we propose a Memory Optimized and Flexible BC (MOF-BC) that enables the IoT users and service providers to remove or summarize their transactions and age their data and to exercise the "right to be forgotten". To increase privacy, a user may employ multiple keys for different transactions. To allow for the removal of stored transactions, all keys would need to be stored which complicates key management and storage. MOF-BC introduces the notion of a Generator Verifier (GV) which is a signed hash of a Generator Verifier Secret (GVS). The GV changes for each transaction to provide privacy yet is signed by a unique key, thus minimizing the information that needs to be stored. A flexible transaction fee model and a reward mechanism is proposed to incentivize users to participate in optimizing memory consumption. Qualitative security and privacy analysis demonstrates that MOF-BC is resilient against several security attacks. Evaluation results show that MOF-BC decreases BC memory consumption by up to 25\% and the user cost by more than two orders of magnitude compared to conventional BC instantiations

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Optimizing Transactions for Captured Memory

Author: Adl-Tabatabai Ali-Reza
Dragojevic Aleksandar
Ni Yang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (i.e., STM barriers). A compiler unaware of that may translate accesses to captured memory into expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler access captured memory, including 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. These techniques can also elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) show that these optimizations can improve performance by to 18% at 16 threads

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Bidirectional optimization of the melting spinning process

Author: Ding Y
Hao K
Hone K
Liang X
Wang H
Wang Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2014
Field of study

This is the author's accepted manuscript (under the provisional title "Bi-directional optimization of the melting spinning process with an immune-enhanced neural network"). The final published article is available from the link below. Copyright 2014 @ IEEE.A bidirectional optimizing approach for the melting spinning process based on an immune-enhanced neural network is proposed. The proposed bidirectional model can not only reveal the internal nonlinear relationship between the process configuration and the quality indices of the fibers as final product, but also provide a tool for engineers to develop new fiber products with expected quality specifications. A neural network is taken as the basis for the bidirectional model, and an immune component is introduced to enlarge the searching scope of the solution field so that the neural network has a larger possibility to find the appropriate and reasonable solution, and the error of prediction can therefore be eliminated. The proposed intelligent model can also help to determine what kind of process configuration should be made in order to produce satisfactory fiber products. To make the proposed model practical to the manufacturing, a software platform is developed. Simulation results show that the proposed model can eliminate the approximation error raised by the neural network-based optimizing model, which is due to the extension of focusing scope by the artificial immune mechanism. Meanwhile, the proposed model with the corresponding software can conduct optimization in two directions, namely, the process optimization and category development, and the corresponding results outperform those with an ordinary neural network-based intelligent model. It is also proved that the proposed model has the potential to act as a valuable tool from which the engineers and decision makers of the spinning process could benefit.National Nature Science Foundation of China, Ministry of Education of China, the Shanghai Committee of Science and Technology), and the Fundamental Research Funds for the Central Universities

Crossref

Brunel University Research Archive

Optimization of Information Rate Upper and Lower Bounds for Channels with Memory

Author: Sadeghi Parastoo
Shams Ramtin
Vontobel Pascal O.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/11/2007
Field of study

We consider the problem of minimizing upper bounds and maximizing lower bounds on information rates of stationary and ergodic discrete-time channels with memory. The channels we consider can have a finite number of states, such as partial response channels, or they can have an infinite state-space, such as time-varying fading channels. We optimize recently-proposed information rate bounds for such channels, which make use of auxiliary finite-state machine channels (FSMCs). Our main contribution in this paper is to provide iterative expectation-maximization (EM) type algorithms to optimize the parameters of the auxiliary FSMC to tighten these bounds. We provide an explicit, iterative algorithm that improves the upper bound at each iteration. We also provide an effective method for iteratively optimizing the lower bound. To demonstrate the effectiveness of our algorithms, we provide several examples of partial response and fading channels, where the proposed optimization techniques significantly tighten the initial upper and lower bounds. Finally, we compare our results with an improved variation of the \emph{simplex} local optimization algorithm, called \emph{Soblex}. This comparison shows that our proposed algorithms are superior to the Soblex method, both in terms of robustness in finding the tightest bounds and in computational efficiency. Interestingly, from a channel coding/decoding perspective, optimizing the lower bound is related to increasing the achievable mismatched information rate, i.e., the information rate of a communication system where the decoder at the receiver is matched to the auxiliary channel, and not to the original channel.Comment: Submitted to IEEE Transactions on Information Theory, November 24, 200

arXiv.org e-Print Archive

Crossref

The Australian National University