Search CORE

10,592 research outputs found

Using Naming Strategies to Make Massively Parallel Systems Work

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1994
Field of study

Strategies for protecting intellectual property when using CUDA applications on graphics processing units

Author: Cheng J.
Cook S.
Eilam E.
Huang H.
Ladakis E.
Makan K.
Reynaud D.
Wilt N.
Publication venue: Association for Computing Machinery (ACM)
Publication date: 05/06/2016
Field of study

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering

Abertay Research Portal

Crossref

University of Strathclyde Institutional Repository

RORS: Enhanced Rule-based OWL Reasoning on Spark

Author: Feng Zhiyong
Liu Zhihui
Rao Guozheng
Wang Xin
Zhang Xiaowang
Publication venue
Publication date: 09/05/2016
Field of study

The rule-based OWL reasoning is to compute the deductive closure of an ontology by applying RDF/RDFS and OWL entailment rules. The performance of the rule-based OWL reasoning is often sensitive to the rule execution order. In this paper, we present an approach to enhancing the performance of the rule-based OWL reasoning on Spark based on a locally optimal executable strategy. Firstly, we divide all rules (27 in total) into four main classes, namely, SPO rules (5 rules), type rules (7 rules), sameAs rules (7 rules), and schema rules (8 rules) since, as we investigated, those triples corresponding to the first three classes of rules are overwhelming (e.g., over 99% in the LUBM dataset) in our practical world. Secondly, based on the interdependence among those entailment rules in each class, we pick out an optimal rule executable order of each class and then combine them into a new rule execution order of all rules. Finally, we implement the new rule execution order on Spark in a prototype called RORS. The experimental results show that the running time of RORS is improved by about 30% as compared to Kim & Park's algorithm (2015) using the LUBM200 (27.6 million triples).Comment: 12 page

arXiv.org e-Print Archive

Crossref

Terrestrial applications: An intelligent Earth-sensing information system

Author
Publication venue
Publication date: 01/11/1982
Field of study

For Abstract see A82-2214

NASA Technical Reports Server

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

Content addressable memory project

Author: Hall J. Storrs
Levy Saul
Miyake Keith M.
Smith Donald E.
Publication venue
Publication date
Field of study

A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks

NASA Technical Reports Server

Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs

Author: A Krechel
D Goddeke
G Haase
G Karypis
G Karypis
GE Blelloch
H Grossauer
H Sterck De
J Bolz
N Bell
O Axelsson
O Axelsson
PS Vassilevski
R Courant
TV Kolev
VE Henson
W Joubert
Publication venue
Publication date: 11/02/2013
Field of study

We design and implement a parallel algebraic multigrid method for isotropic graph Laplacian problems on multicore Graphical Processing Units (GPUs). The proposed AMG method is based on the aggregation framework. The setup phase of the algorithm uses a parallel maximal independent set algorithm in forming aggregates and the resulting coarse level hierarchy is then used in a K-cycle iteration solve phase with a

\ell^1

-Jacobi smoother. Numerical tests of a parallel implementation of the method for graphics processors are presented to demonstrate its effectiveness.Comment: 18 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Some strategies for the simulation of vocabulary agreement in multi-agent communities

Author: Alfonseca Manuel
Lara Juan de
Publication venue: J A S S S, University of Surrey, Department of Sociology
Publication date: 01/01/2000
Field of study

In this paper, we present several experiments of belief propagation in multi-agent communities. Each agent in the simulation has an initial random vocabulary (4 words) corresponding to each possible movement (north, south, east and west). Agents move and communicate the associated word to the surrounding agents, which can be convinced by the 'speaking agent', and change their corresponding word by 'imitation'. Vocabulary uniformity is achieved, but strong interactions and competition can occur between dominant words. Several moving and trusting strategies as well as agent roles are analyzed.This paper has been sponsored by the Spanish Interdepartmental Commission of Science and Technology (CICYT), project number TEL1999-0181

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo