75 research outputs found

    Literature Study on Data Protection for Cloud Storage

    Get PDF
    Many data security and privacy incidents are observed in today Cloud services. On the one hand, Cloud service providers deal with    a large number of external attacks. In 2018, a total of 1.5 million Sing Health patients’ non-medical personal data were stolen from the health system in Singapore. On the other hand, Cloud service providers cannot be entirely trusted either. Personal data may be exploited in a malicious way such as in the Face book and Cambridge Analytical data scandal which affected 87 million users in 2018. Thus, it becomes increasingly important for end users to efficiently protect their data (texts, images, or videos) independently from Cloud service providers. In this paper, we aim at presenting a novel data protection scheme by combining fragmentation, encryption, and dispersion with high performance and enhanced level of protection as Literature study

    Optimization of Tensor-product Operations in Nekbone on GPUs

    Full text link
    In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to make the computations layer by layer. This enables us to use loop unrolling as well as utilize registers and shared memory efficiently. Our implementation is then compared on both the Pascal and Volta GPU architectures to previous GPU versions of Nekbone as well as a measured roofline. The results show that our implementation outperforms previous GPU Nekbone implementations by 6-10%. Compared to the measured roofline, we obtain 77 - 92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024 - 4096 elements and polynomial degree 9.Comment: 4 pages, 4 figure

    Speeding Up RSA Encryption Using GPU Parallelization

    Get PDF
    Abstract-Due to the ever-increasing computing capability of high performance computers today, in order to protect encrypted data from being cracked, the bit number used in RSA, a common and practicable public-key cryptosystem, is also getting longer, resulting in increasing operation time spent in executing the RSA algorithm. We also note that while the development of CPU has reached limits, the graphics processing unit (GPU), a highly parallel programmable processor, has become an integral part of today's mainstream computing systems. Therefore, it is a savvy choice to take advantage of GPU computing to accelerate computation of the RSA algorithm and enhance its applicability as well. After analyzing the RSA algorithm, we find that big number operations consume most parts of computing resources. As the benefit acquired from combining Montgomery with GPU-based parallel methods is not high enough, we further introduce the Fourier transform and Newton's method to design a new parallel algorithm to accelerate the computation of big numbers

    Improving Scalability and Maintenance of Software for High-Performance Scientific Computing by Combining MDE and Frameworks

    Get PDF
    International audienceIn recent years, numerical simulation has attracted increasing interest within industry and among academics. Paradoxically, the development and maintenance of high performance scientific computing software has become more complex due to the diversification of hardware architectures and their related programming languages and libraries. In this paper, we share our experience in using model-driven development for numerical simulation software. Our approach called MDE4HPC proposes to tackle development complexity by using a domain specific modeling language to describe abstract views of the software. We present and analyse the results obtained with its implementation when deriving this abstract model to target Arcane, a development framework for 2D and 3D numerical simulation software

    Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks

    Full text link
    Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid distributed cloud framework with a unified view to multiple clouds and an on-premise infrastructure for processing tasks using both CPU and GPU compute instances at scale. The system implements a distributed file system and failure-tolerant task processing scheduler, independent of the language and Deep Learning framework used. It allows to utilize unstable cheap resources on the cloud to significantly reduce costs. We demonstrate the scalability of the framework on running pre-processing, distributed training, hyperparameter search and large-scale inference tasks utilizing 10,000 CPU cores and 300 GPU instances with the overall processing power of 30 petaflops

    Architecture-Aware Optimization on a 1600-core Graphics Processor

    Get PDF
    The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers in the world, as ranked by the TOP500, employ GPUs as accelerators. Despite this increasing interest in GPUs, however, optimizing the performance of a GPU-accelerated compute node requires deep technical knowledge of the underlying architecture. Although significant literature exists on how to optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on the AMD GPU. Consequently, we present and evaluate architecture-aware optimizations for the AMD GPU. The most prominent optimizations include (i) explicit use of registers, (ii) use of vector types, (iii) removal of branches, and (iv) use of image memory for global data. We demonstrate the efficacy of our AMD GPU optimizations by applying each optimization in isolation as well as in concert to a large-scale, molecular modeling application called GEM. Via these AMD-specific GPU optimizations, the AMD Radeon HD 5870 GPU delivers 65% better performance than with the wellknown NVIDIA-specific optimizations

    Determining Optimal Mining Work Size on the OpenCL Platform for the Ethereum Cryptocurrency

    Get PDF
    In terms of cryptocurrency, mining is a process of creating a new transaction block to add it to the blockchain. The cryptocurrency protocol should ensure the reliability of new transaction blocks. One of the popular mining protocols is the Proof-of-Work protocol, which requires the miner to perform a certain work to verify its right to add a new block into the blockchain. To perform this work, high-performance hardware is used, such as GPU. On the program level, hardware needs special computing framework, for example, CUDA or OpenCL. In this article, we discuss Ethereum cryptocurrency mining using the OpenCL standard. The Ethereum cryptocurrency is the most popular cryptocurrency with GPU-based mining. There are several open-source implementations of the Ethereum cryptocurrency miners. The host-part of the OpenCL-miner is considered, which makes the research results independent of the mining algorithm and allows using the results of the research in the mining of other cryptocurrencies. During the research, we have found the problems, which lead to mining productivity loss, and we are looking for the ways to resolve these problems and thus increase mining performance. As part of solving these problems, we have developed the algorithm for the functioning of the miner and proposed the methodology of determining the optimal size of OpenCL work, which allows to reduce the impact of problems found and achieve maximum mining productivity using OpenCL framework
    • …
    corecore