75 research outputs found
Literature Study on Data Protection for Cloud Storage
Many data security and privacy incidents are observed in today Cloud services. On the one hand, Cloud service providers deal with   a large number of external attacks. In 2018, a total of 1.5 million Sing Health patients’ non-medical personal data were stolen from the health system in Singapore. On the other hand, Cloud service providers cannot be entirely trusted either. Personal data may be exploited in a malicious way such as in the Face book and Cambridge Analytical data scandal which affected 87 million users in 2018. Thus, it becomes increasingly important for end users to efficiently protect their data (texts, images, or videos) independently from Cloud service providers. In this paper, we aim at presenting a novel data protection scheme by combining fragmentation, encryption, and dispersion with high performance and enhanced level of protection as Literature study
Optimization of Tensor-product Operations in Nekbone on GPUs
In the CFD solver Nek5000, the computation is dominated by the evaluation of
small tensor operations. Nekbone is a proxy app for Nek5000 and has previously
been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we
continue this effort and optimize the main tensor-product operation in Nekbone
further. Our optimization is done in CUDA and uses a different, 2D, thread
structure to make the computations layer by layer. This enables us to use loop
unrolling as well as utilize registers and shared memory efficiently. Our
implementation is then compared on both the Pascal and Volta GPU architectures
to previous GPU versions of Nekbone as well as a measured roofline. The results
show that our implementation outperforms previous GPU Nekbone implementations
by 6-10%. Compared to the measured roofline, we obtain 77 - 92% of the peak
performance for both Nvidia P100 and V100 GPUs for inputs with 1024 - 4096
elements and polynomial degree 9.Comment: 4 pages, 4 figure
Speeding Up RSA Encryption Using GPU Parallelization
Abstract-Due to the ever-increasing computing capability of high performance computers today, in order to protect encrypted data from being cracked, the bit number used in RSA, a common and practicable public-key cryptosystem, is also getting longer, resulting in increasing operation time spent in executing the RSA algorithm. We also note that while the development of CPU has reached limits, the graphics processing unit (GPU), a highly parallel programmable processor, has become an integral part of today's mainstream computing systems. Therefore, it is a savvy choice to take advantage of GPU computing to accelerate computation of the RSA algorithm and enhance its applicability as well. After analyzing the RSA algorithm, we find that big number operations consume most parts of computing resources. As the benefit acquired from combining Montgomery with GPU-based parallel methods is not high enough, we further introduce the Fourier transform and Newton's method to design a new parallel algorithm to accelerate the computation of big numbers
Improving Scalability and Maintenance of Software for High-Performance Scientific Computing by Combining MDE and Frameworks
International audienceIn recent years, numerical simulation has attracted increasing interest within industry and among academics. Paradoxically, the development and maintenance of high performance scientific computing software has become more complex due to the diversification of hardware architectures and their related programming languages and libraries. In this paper, we share our experience in using model-driven development for numerical simulation software. Our approach called MDE4HPC proposes to tackle development complexity by using a domain specific modeling language to describe abstract views of the software. We present and analyse the results obtained with its implementation when deriving this abstract model to target Arcane, a development framework for 2D and 3D numerical simulation software
Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks
Training and deploying deep learning models in real-world applications
require processing large amounts of data. This is a challenging task when the
amount of data grows to a hundred terabytes, or even, petabyte-scale. We
introduce a hybrid distributed cloud framework with a unified view to multiple
clouds and an on-premise infrastructure for processing tasks using both CPU and
GPU compute instances at scale. The system implements a distributed file system
and failure-tolerant task processing scheduler, independent of the language and
Deep Learning framework used. It allows to utilize unstable cheap resources on
the cloud to significantly reduce costs. We demonstrate the scalability of the
framework on running pre-processing, distributed training, hyperparameter
search and large-scale inference tasks utilizing 10,000 CPU cores and 300 GPU
instances with the overall processing power of 30 petaflops
Architecture-Aware Optimization on a 1600-core Graphics Processor
The graphics processing unit (GPU) continues to
make significant strides as an accelerator in commodity cluster
computing for high-performance computing (HPC). For example,
three of the top five fastest supercomputers in the world, as
ranked by the TOP500, employ GPUs as accelerators. Despite this
increasing interest in GPUs, however, optimizing the performance
of a GPU-accelerated compute node requires deep technical
knowledge of the underlying architecture. Although significant
literature exists on how to optimize GPU performance on the
more mature NVIDIA CUDA architecture, the converse is true
for OpenCL on the AMD GPU.
Consequently, we present and evaluate architecture-aware optimizations
for the AMD GPU. The most prominent optimizations
include (i) explicit use of registers, (ii) use of vector types, (iii)
removal of branches, and (iv) use of image memory for global data.
We demonstrate the efficacy of our AMD GPU optimizations by
applying each optimization in isolation as well as in concert to
a large-scale, molecular modeling application called GEM. Via
these AMD-specific GPU optimizations, the AMD Radeon HD
5870 GPU delivers 65% better performance than with the wellknown
NVIDIA-specific optimizations
Determining Optimal Mining Work Size on the OpenCL Platform for the Ethereum Cryptocurrency
In terms of cryptocurrency, mining is a process of creating a new transaction block to add it to the blockchain. The cryptocurrency protocol should ensure the reliability of new transaction blocks. One of the popular mining protocols is the Proof-of-Work protocol, which requires the miner to perform a certain work to verify its right to add a new block into the blockchain. To perform this work, high-performance hardware is used, such as GPU. On the program level, hardware needs special computing framework, for example, CUDA or OpenCL. In this article, we discuss Ethereum cryptocurrency mining using the OpenCL standard. The Ethereum cryptocurrency is the most popular cryptocurrency with GPU-based mining. There are several open-source implementations of the Ethereum cryptocurrency miners. The host-part of the OpenCL-miner is considered, which makes the research results independent of the mining algorithm and allows using the results of the research in the mining of other cryptocurrencies. During the research, we have found the problems, which lead to mining productivity loss, and we are looking for the ways to resolve these problems and thus increase mining performance. As part of solving these problems, we have developed the algorithm for the functioning of the miner and proposed the methodology of determining the optimal size of OpenCL work, which allows to reduce the impact of problems found and achieve maximum mining productivity using OpenCL framework
- …