Search CORE

3,308 research outputs found

A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

Author: Novaković Vedran
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 27/09/2014
Field of study

We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin

arXiv.org e-Print Archive

CiteSeerX

Stealthy Deception Attacks Against SCADA Systems

Author: A Kleinmann
A Kleinmann
A Kleinmann
B Mukherjee
C Alcaraz
F Pasqualetti
G Liang
IN Fovino
N Erez
N Goldenberg
R Langner
Y Liu
Y Mo
Publication venue
Publication date: 28/06/2017
Field of study

SCADA protocols for Industrial Control Systems (ICS) are vulnerable to network attacks such as session hijacking. Hence, research focuses on network anomaly detection based on meta--data (message sizes, timing, command sequence), or on the state values of the physical process. In this work we present a class of semantic network-based attacks against SCADA systems that are undetectable by the above mentioned anomaly detection. After hijacking the communication channels between the Human Machine Interface (HMI) and Programmable Logic Controllers (PLCs), our attacks cause the HMI to present a fake view of the industrial process, deceiving the human operator into taking manual actions. Our most advanced attack also manipulates the messages generated by the operator's actions, reversing their semantic meaning while causing the HMI to present a view that is consistent with the attempted human actions. The attacks are totaly stealthy because the message sizes and timing, the command sequences, and the data values of the ICS's state all remain legitimate. We implemented and tested several attack scenarios in the test lab of our local electric company, against a real HMI and real PLCs, separated by a commercial-grade firewall. We developed a real-time security assessment tool, that can simultaneously manipulate the communication to multiple PLCs and cause the HMI to display a coherent system--wide fake view. Our tool is configured with message-manipulating rules written in an ICS Attack Markup Language (IAML) we designed, which may be of independent interest. Our semantic attacks all successfully fooled the operator and brought the system to states of blackout and possible equipment damage

arXiv.org e-Print Archive

Crossref

Volume ray casting techniques and applications using general purpose computations on graphics processing units

Author: Romero Michael
Publication venue: RIT Scholar Works
Publication date: 01/06/2009
Field of study

Traditional 3D computer graphics focus on rendering the exterior of objects. Volume rendering is a technique used to visualize information corresponding to the interior of an object, commonly used in medical imaging and other fields. Visualization of such data may be accomplished by ray casting; an embarrassingly parallel algorithm also commonly used in ray tracing. There has been growing interest in performing general purpose computations on graphics processing units (GPGPU), which are capable exploiting parallel applications and yielding far greater performance than sequential implementations on CPUs. Modern GPUs allow for rapid acceleration of volume rendering applications, offering affordable high performance visualization systems. This thesis explores volume ray casting performance and visual quality enhancements using the NVIDIA CUDA platform, and demonstrates how high quality volume renderings can be produced with interactive and real time frame rates on modern commodity graphics hardware. A number of techniques are employed in this effort, including early ray termination, super sampling and texture filtering. In a performance comparison of a sequential versus CUDA implementation on high-end hardware, the latter is capable of rendering 60 frames per second with an impressive price-performance ratio heavily favoring GPUs. A number of unique volume rendering applications are explored including multiple volume rendering capable of arbitrary placement and rigid volume registration, hypertexturing and stereoscopic anaglyphs, each greatly enhanced by the real time interaction of volume data. The techniques and applications discussed in this thesis may prove to be invaluable tools in fields such as medical and molecular imaging, flow and scientific visualization, engineering drawing and many others

RIT Scholar Works

Performance counter-based strategies to improve data locality on multiprocessor systems: reordering and page migration techniques

Author: Lorenzo del Castillo Juan Ángel
Publication venue
Publication date: 01/01/2012
Field of study

In this dissertation we approach the study of Precise Event-Based Sampling (PEBS) techniques to improve the performance of applications on a NUMA, Itanium2-based system. We demonstrate that a low-cost, PEBS profiling can support strategies to improve the performance of an important group of computational and scientific codes in runtime. In addition, the accurate information provided by the new Event Adress Registers (EAR) of the Intel Itanium architecture helps foster the development of new data allocation strategies. Following this line, we have also developed a series of dynamic page migration PEBS strategies. Specifically, two problems are addressed: how to improve the performance of locality optimisation techniques for irregular codes in runtime, particularising for the Sparse Matrix-Vector product kernel, and how to develop strategies for dynamic page migration. To summarise, the main contributions of this dissertation are: 1. A study of the different factors that affect the performance, as well as data and thread allocation policies, in the FinisTerrae supercomputer, the target platform in which this thesis relies on. 2. The implementation of a performance model for FinisTerrae. 3. The development of hardware counter-based strategies to assist reordering techniques for irregular codes in order to reduce their cost and improve their behaviour. 4. The development of novel hardware counter-guided, dynamic page migration algorithms that take advantage of the new features provided by the PEBS. As a software contribution, we present a user-level page-migration framework to monitor, sample and control an application in runtime

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional da Universidade de Santiago de Compostela

Power models, energy models and libraries for energy-efficient concurrent data structures and algorithms

Author: Atalar Aras
Gidenstam Anders
Ha Hoai Phuong
Renaud-Goud Paul
Tran Vi Ngoc-Nha
Tsigas Philippas
Umar Ibrahim
Walulya Ivan
Publication venue: The EXCESS Consortium
Publication date: 01/01/2016
Field of study

EXCESS deliverable D2.3. More information at http://www.excess-project.eu/This deliverable reports the results of the power models, energy models and librariesfor energy-efficient concurrent data structures and algorithms as available by projectmonth 30 of Work Package 2 (WP2). It reports i) the latest results of Task 2.2-2.4 onproviding programming abstractions and libraries for developing energy-efficient datastructures and algorithms and ii) the improved results of Task 2.1 on investigating andmodeling the trade-off between energy and performance of concurrent data structuresand algorithms. The work has been conducted on two main EXCESS platforms: Intelplatforms with recent Intel multicore CPUs and Movidius Myriad platforms

Munin - Open Research Archive

Scalable ray tracing with multiple GPGPUs

Author: Urra Rodrigo A.
Publication venue: RIT Scholar Works
Publication date: 01/02/2009
Field of study

Rapid development in the field of computer graphics over the last 40 years has brought forth different techniques to render scenes. Rasterization is today’s most widely used technique, which in its most basic form sequentially draws thousands of polygons and applies texture on them. Ray tracing is an alternative method that mimics light transport by using rays to sample a scene in memory and render the color found at each ray’s scene intersection point. Although mainstream hardware directly supports rasterization, ray tracing would be the preferred technique due to its ability to produce highly crisp and realistic graphics, if hardware were not a limitation. Making an immediate hardware transition from rasterization to ray tracing would have a severe impact on the computer graphics industry since it would require redevelopment of existing 3D graphics-employing software, so any transition to ray tracing would be gradual. Previous efforts to perform ray tracing on mainstream rasterizing hardware platforms with a single processor have performed poorly. This thesis explores how a multiple GPGPU system can be used to render scenes via ray tracing. A ray tracing engine and API groundwork was developed using NVIDIA’s CUDA (Compute Unified Device Architecture) GPGPU programming environment and was used to evaluate performance scalability across a multi-GPGPU system. This engine supports triangle, sphere, disc, rectangle, and torus rendering. It also allows independent activation of graphics features including procedural texturing, Phong illumination, reflections, translucency, and shadows. Correctness of rendered images validates the ray traced results, and timing of rendered scenes benchmarks performance. The main test scene contains all object types, has a total of 32 Abstract objects, and applies all graphics features. Ray tracing this scene using two GPGPUs outperformed the single-GPGPU and single-CPU systems, yielding respective speedups of up to 1.8 and 31.25. The results demonstrate how much potential exists in treating a modern dual-GPU architecture as a dual-GPGPU system in order to facilitate a transition from rasterization to ray tracing

RIT Scholar Works