98 research outputs found

    Extending heterogeneous applications to remote Co-processors with rOpenCL

    Get PDF
    In heterogeneous computing systems, general purpose CPUs are coupled with co-processors of different architectures, like GPUs and FPGAs. Applications may take advantage of this heterogeneous device ensemble to accelerate execution. However, developing heterogeneous applications requires specific programming models, under which applications unfold into code components targeting different computing devices. OpenCL is one of the main programming models for heterogeneous applications, set apart from others due to its openness, vendor independence and support for different co-processors. In the original OpenCL application model, a heterogeneous application starts in a certain host node, and then resorts to the local co-processors attached to that host. Therefore, co-processors at other nodes, networked with the host node, are inaccessible and cannot be used to accelerate the application. rOpenCL (remote OpenCL) overcomes this limitation for a significant set of the OpenCL 1.2 API, offering OpenCL applications transparent access to remote devices through a TPC/IP based network. This paper presents the architecture and the most relevant implementation details of rOpenCL, together with the results of a preliminary set of reference benchmarks. These prove the stability of the current prototype and show that, in many scenarios, the network overhead is smaller than expected.info:eu-repo/semantics/publishedVersio

    Astro - A Low-Cost, Low-Power Cluster for CPU-GPU Hybrid Computing using the Jetson TK1

    Get PDF
    With the rising costs of large scale distributed systems many researchers have began looking at utilizing low power architectures for clusters. In this paper, we describe our Astro cluster, which consists of 46 NVIDIA Jetson TK1 nodes each equipped with an ARM Cortex A15 CPU, 192 core Kepler GPU, 2 GB of RAM, and 16 GB of flash storage. The cluster has a number of advantages when compared to conventional clusters including lower power usage, ambient cooling, shared memory between the CPU and GPU, and affordability. The cluster is built using commodity hardware and can be setup for relatively low costs while providing up to 190 single precision GFLOPS of computing power per node due to its combined GPU/CPU architecture. The cluster currently uses one 48-port Gigabit Ethernet switch and runs Linux for Tegra, a modified version of Ubuntu provided by NVIDIA as its operating system. Common file systems such as PVFS, Ceph, and NFS are supported by the cluster and benchmarks such as HPL, LAPACK, and LAMMPS are used to evaluate the system. At peak performance, the cluster is able to produce 328 GFLOPS of double precision and a peak of 810W using the LINPACK benchmark placing the cluster at 324th place on the Green500. Single precision benchmarks result in a peak performance of 6800 GFLOPs. The Astro cluster aims to be a proof-of-concept for future low power clusters that utilize a similar architecture. The cluster is installed with many of the same applications used by top supercomputers and is validated using the several standard supercomputing benchmarks. We show that with the rise of low-power CPUs and GPUs, and the need for lower server costs, this cluster provides insight into how ARM and CPU-GPU hybrid chips will perform in high-performance computing

    On Efficiency of Distributed Password Recovery

    Get PDF
    One of the major challenges in digital forensics today is data encryption. Due to the leaked information about unlawful sniffing, many users decided to protect their data by encryption. In case of criminal activities, forensic experts are challenged how to decipher suspect\u27s data that are subject to investigation. A common method how to overcome password-based protection is a brute force password recovery using GPU-accelerated hardware. This approach seems to be expensive. This paper presents an alternative approach using task distribution based on BOINC platform. The cost, time and energy efficiency of this approach is discussed and compared to the GPU-based solution

    Data remanence and digital forensic investigation for CUDA Graphics Processing Units

    Get PDF
    This paper investigates the practicality of memory attacks on commercial Graphics Processing Units (GPUs). With recent advances in the performance and viability of using GPUs for various highly-parallelised data processing tasks, a number of security challenges are raised. Unscrupulous software running subsequently on the same GPU, either by the same user, or another user, in a multi-user system, may be able to gain access to the contents of the GPU memory. This contains data from previous program executions. In certain use-cases, where the GPU is used to offload intensive parallel processing such as pattern matching for an intrusion detection system, financial systems, or cryptographic algorithms, it may be possible for the GPU memory to contain privileged data, which would ordinarily be inaccessible to an unprivileged application running on the host computer. With GPUs potentially yielding access to confidential information, existing research in the field is built upon, to investigate the practicality of extracting data from global, shared and texture memory, and retrieving this data for further analysis. These techniques are also implemented on various GPUs using three different Nvidia CUDA versions. A novel methodology for digital forensic examination of GPU memory for remanent data is then proposed, along with some suggestions and considerations towards countermeasures and anti-forensic technique

    Password Recovery in Distributed Environment

    Get PDF
    Cílem práce je navržení a implementace frameworku umožňujícího obnovu hesel v distribuovaném prostředí. Úvodní část se tak zaměřuje na analýzu bezpečnosti hesel, technik používaných pro útoky na ně a také představuje metody bránící napadání hesel. Představen je nástroj Wrathion umožňující obnovu hesel za pomocí akcelerace na grafických kartách díky integraci frameworku OpenCL. Provedena je také analýza dostupných prostředí umožňující běh výpočetních úloh na více zařízeních, na jejím základě je pro rozšíření Wrathionu zvolena platforma OpenMPI. Popsány jsou jednotlivé úpravy a přidané komponeny nástroje a celý systém je také podroben experimentům zaměřujícím se na měření škálovatelnosti a náročnosti síťového provozu. Diskutována je také finanční stránka věci z pohledu využitelnosti Wrathionu v cloudovém distribuovaném prostředí.The goal of this thesis is to design and implement a framework allowing password recovery in a distributed environment. The research is therefore focused on analyzing the security of passwords, techniques used for attacks on them and also presents methods preventing attacks on passwords. Described is the Wrathion tool which is allowing password recovery using acceleration on graphic cards through the integration of OpenCL framework. Conducted is also an analysis of available environments providing means to run computing tasks on multiple devices, based on which the OpenMPI platform is chosen for extending Wrathion. Disclosed are various modifications and added components, and the entire system is also subjected to experiments aiming at the measuring of scalability and network traffic performance. The financial side of the use of Wrathion tool is also discussed in terms of its usability in cloud based distributed environment.

    Performance Improvements for a Large-scale Geological Simulation

    Get PDF
    AbstractGeological models have been successfully used to identify and study geothermal energy resources. Many computer simulations based on these models are data-intensive applications. Large-scale geological simulations require high performance computing (HPC) techniques to run within reasonable time constraints and performance levels. One research area that can benefit greatly from HPC techniques is the modeling of heat flow beneath the Earth's surface. This paper describes the application of HPC techniques to increase the scale of research with a well-established geological model. Recently, a serial C++ application based on this geological model was ported to a parallel HPC applications using MPI. An area of focus was to increase the performance of the MPI version to enable state or regional scale simulations using large numbers of processors. First, synchronous communications among MPI processes was replaced by overlapping communication and computation (asynchronous communication). Asynchronous communication improved performance over synchronous communications by averages of 28% using 56 cores in one environment and 46% using 56 cores in another. Second, an approach for load balancing involving repartitioning the data at the start of the program resulted in runtime performance improvements of 32% using 48 cores in the first environment and 14% using 24 cores in the second when compared to the asynchronous version. An additional feature, modeling of erosion, was also added to the MPI code base. The performance improvement techniques under erosion were less effective

    HETEROGENEOUS GPU&CPU CLUSTER FOR HIGH PERFORMANCE COMPUTING IN CRYPTOGRAPHY

    Get PDF
    This paper addresses issues associated with distributed computing systems andthe application of mixed GPU&CPU technology to data encryption and decryptionalgorithms. We describe a heterogenous cluster HGCC formed by twotypes of nodes: Intel processor with NVIDIA graphics processing unit and AMDprocessor with AMD graphics processing unit (formerly ATI), and a novel softwareframework that hides the heterogeneity of our cluster and provides toolsfor solving complex scientific and engineering problems. Finally, we present theresults of numerical experiments. The considered case study is concerned withparallel implementations of selected cryptanalysis algorithms. The main goal ofthe paper is to show the wide applicability of the GPU&CPU technology tolarge scale computation and data processing
    corecore