85,363 research outputs found

    Optimizing GPU Memory Transactions for Convolution Operations

    Get PDF
    Convolution computation is a common operation in deep neural networks (DNNs) and is often responsible for performance bottlenecks during training and inferencing. Existing approaches for accelerating convolution operations aim to reduce computational complexity. However, these strategies often increase the memory footprint with extra memory accesses, thereby leaving much room for performance improvement. This paper presents a novel approach to optimize memory access for convolution operations, specifically targeting GPU execution. Our approach leverages two optimization techniques to reduce the number of memory operations for convolution operations performed on the width and height dimensions. For convolution computations on the width dimension, we exploit shuffle instructions to exchange the overlapped columns of the input for reducing the number of memory transactions. For convolution operations on the height dimension, we multiply each overlapped row of the input with multiple rows of a filter to compute multiple output elements to improve the data locality of row elements. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPU. For 2D convolution, our approach delivers over faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to speedups over the quickest algorithm of cuDNN. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPU. For 2D convolution, our approach delivers over 2Ă— faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to 1.3Ă— speedups over the quickest algorithm of cuDNN

    MOF-BC: A Memory Optimized and Flexible BlockChain for Large Scale Networks

    Full text link
    BlockChain (BC) immutability ensures BC resilience against modification or removal of the stored data. In large scale networks like the Internet of Things (IoT), however, this feature significantly increases BC storage size and raises privacy challenges. In this paper, we propose a Memory Optimized and Flexible BC (MOF-BC) that enables the IoT users and service providers to remove or summarize their transactions and age their data and to exercise the "right to be forgotten". To increase privacy, a user may employ multiple keys for different transactions. To allow for the removal of stored transactions, all keys would need to be stored which complicates key management and storage. MOF-BC introduces the notion of a Generator Verifier (GV) which is a signed hash of a Generator Verifier Secret (GVS). The GV changes for each transaction to provide privacy yet is signed by a unique key, thus minimizing the information that needs to be stored. A flexible transaction fee model and a reward mechanism is proposed to incentivize users to participate in optimizing memory consumption. Qualitative security and privacy analysis demonstrates that MOF-BC is resilient against several security attacks. Evaluation results show that MOF-BC decreases BC memory consumption by up to 25\% and the user cost by more than two orders of magnitude compared to conventional BC instantiations

    Optimizing Transactions for Captured Memory

    Get PDF
    In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (i.e., STM barriers). A compiler unaware of that may translate accesses to captured memory into expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler access captured memory, including 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. These techniques can also elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) show that these optimizations can improve performance by to 18% at 16 threads

    Bidirectional optimization of the melting spinning process

    Get PDF
    This is the author's accepted manuscript (under the provisional title "Bi-directional optimization of the melting spinning process with an immune-enhanced neural network"). The final published article is available from the link below. Copyright 2014 @ IEEE.A bidirectional optimizing approach for the melting spinning process based on an immune-enhanced neural network is proposed. The proposed bidirectional model can not only reveal the internal nonlinear relationship between the process configuration and the quality indices of the fibers as final product, but also provide a tool for engineers to develop new fiber products with expected quality specifications. A neural network is taken as the basis for the bidirectional model, and an immune component is introduced to enlarge the searching scope of the solution field so that the neural network has a larger possibility to find the appropriate and reasonable solution, and the error of prediction can therefore be eliminated. The proposed intelligent model can also help to determine what kind of process configuration should be made in order to produce satisfactory fiber products. To make the proposed model practical to the manufacturing, a software platform is developed. Simulation results show that the proposed model can eliminate the approximation error raised by the neural network-based optimizing model, which is due to the extension of focusing scope by the artificial immune mechanism. Meanwhile, the proposed model with the corresponding software can conduct optimization in two directions, namely, the process optimization and category development, and the corresponding results outperform those with an ordinary neural network-based intelligent model. It is also proved that the proposed model has the potential to act as a valuable tool from which the engineers and decision makers of the spinning process could benefit.National Nature Science Foundation of China, Ministry of Education of China, the Shanghai Committee of Science and Technology), and the Fundamental Research Funds for the Central Universities

    Optimization of Information Rate Upper and Lower Bounds for Channels with Memory

    Full text link
    We consider the problem of minimizing upper bounds and maximizing lower bounds on information rates of stationary and ergodic discrete-time channels with memory. The channels we consider can have a finite number of states, such as partial response channels, or they can have an infinite state-space, such as time-varying fading channels. We optimize recently-proposed information rate bounds for such channels, which make use of auxiliary finite-state machine channels (FSMCs). Our main contribution in this paper is to provide iterative expectation-maximization (EM) type algorithms to optimize the parameters of the auxiliary FSMC to tighten these bounds. We provide an explicit, iterative algorithm that improves the upper bound at each iteration. We also provide an effective method for iteratively optimizing the lower bound. To demonstrate the effectiveness of our algorithms, we provide several examples of partial response and fading channels, where the proposed optimization techniques significantly tighten the initial upper and lower bounds. Finally, we compare our results with an improved variation of the \emph{simplex} local optimization algorithm, called \emph{Soblex}. This comparison shows that our proposed algorithms are superior to the Soblex method, both in terms of robustness in finding the tightest bounds and in computational efficiency. Interestingly, from a channel coding/decoding perspective, optimizing the lower bound is related to increasing the achievable mismatched information rate, i.e., the information rate of a communication system where the decoder at the receiver is matched to the auxiliary channel, and not to the original channel.Comment: Submitted to IEEE Transactions on Information Theory, November 24, 200
    • …
    corecore