3,308 research outputs found
A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units
We present a hierarchically blocked one-sided Jacobi algorithm for the
singular value decomposition (SVD), targeting both single and multiple graphics
processing units (GPUs). The blocking structure reflects the levels of GPU's
memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining
high relative accuracy. To this end, we developed a family of parallel pivot
strategies on GPU's shared address space, but applicable also to inter-GPU
communication. Unlike common hybrid approaches, our algorithm in a single GPU
setting needs a CPU for the controlling purposes only, while utilizing GPU's
resources to the fullest extent permitted by the hardware. When required by the
problem size, the algorithm, in principle, scales to an arbitrary number of GPU
nodes. The scalability is demonstrated by more than twofold speedup for
sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single
Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin
Stealthy Deception Attacks Against SCADA Systems
SCADA protocols for Industrial Control Systems (ICS) are vulnerable to
network attacks such as session hijacking. Hence, research focuses on network
anomaly detection based on meta--data (message sizes, timing, command
sequence), or on the state values of the physical process. In this work we
present a class of semantic network-based attacks against SCADA systems that
are undetectable by the above mentioned anomaly detection. After hijacking the
communication channels between the Human Machine Interface (HMI) and
Programmable Logic Controllers (PLCs), our attacks cause the HMI to present a
fake view of the industrial process, deceiving the human operator into taking
manual actions. Our most advanced attack also manipulates the messages
generated by the operator's actions, reversing their semantic meaning while
causing the HMI to present a view that is consistent with the attempted human
actions. The attacks are totaly stealthy because the message sizes and timing,
the command sequences, and the data values of the ICS's state all remain
legitimate.
We implemented and tested several attack scenarios in the test lab of our
local electric company, against a real HMI and real PLCs, separated by a
commercial-grade firewall. We developed a real-time security assessment tool,
that can simultaneously manipulate the communication to multiple PLCs and cause
the HMI to display a coherent system--wide fake view. Our tool is configured
with message-manipulating rules written in an ICS Attack Markup Language (IAML)
we designed, which may be of independent interest. Our semantic attacks all
successfully fooled the operator and brought the system to states of blackout
and possible equipment damage
Volume ray casting techniques and applications using general purpose computations on graphics processing units
Traditional 3D computer graphics focus on rendering the exterior of objects. Volume rendering is a technique used to visualize information corresponding to the interior of an object, commonly used in medical imaging and other fields. Visualization of such data may be accomplished by ray casting; an embarrassingly parallel algorithm also commonly used in ray tracing. There has been growing interest in performing general purpose computations on graphics processing units (GPGPU), which are capable exploiting parallel applications and yielding far greater performance than sequential implementations on CPUs. Modern GPUs allow for rapid acceleration of volume rendering applications, offering affordable high performance visualization systems. This thesis explores volume ray casting performance and visual quality enhancements using the NVIDIA CUDA platform, and demonstrates how high quality volume renderings can be produced with interactive and real time frame rates on modern commodity graphics hardware. A number of techniques are employed in this effort, including early ray termination, super sampling and texture filtering. In a performance comparison of a sequential versus CUDA implementation on high-end hardware, the latter is capable of rendering 60 frames per second with an impressive price-performance ratio heavily favoring GPUs. A number of unique volume rendering applications are explored including multiple volume rendering capable of arbitrary placement and rigid volume registration, hypertexturing and stereoscopic anaglyphs, each greatly enhanced by the real time interaction of volume data. The techniques and applications discussed in this thesis may prove to be invaluable tools in fields such as medical and molecular imaging, flow and scientific visualization, engineering drawing and many others
Performance counter-based strategies to improve data locality on multiprocessor systems: reordering and page migration techniques
In this dissertation we approach the study of Precise Event-Based Sampling (PEBS) techniques to improve the performance of applications on a NUMA, Itanium2-based system. We demonstrate that a low-cost, PEBS profiling can support strategies to improve the performance of an important group of computational and scientific codes in runtime. In addition, the accurate information provided by the new Event Adress Registers (EAR) of the Intel Itanium architecture helps foster the development of new data allocation strategies. Following this line, we have also developed a series of dynamic page migration PEBS strategies. Specifically, two problems are addressed: how to improve the performance of locality optimisation techniques for irregular codes in runtime, particularising for the Sparse Matrix-Vector product kernel, and how to develop strategies for dynamic page migration.
To summarise, the main contributions of this dissertation are:
1. A study of the different factors that affect the performance, as well as data and thread allocation policies, in the FinisTerrae supercomputer, the target platform in which this thesis relies on.
2. The implementation of a performance model for FinisTerrae.
3. The development of hardware counter-based strategies to assist reordering techniques
for irregular codes in order to reduce their cost and improve their behaviour.
4. The development of novel hardware counter-guided, dynamic page migration
algorithms that take advantage of the new features provided by the PEBS.
As a software contribution, we present a user-level page-migration framework to monitor,
sample and control an application in runtime
Power models, energy models and libraries for energy-efficient concurrent data structures and algorithms
EXCESS deliverable D2.3. More information at http://www.excess-project.eu/This deliverable reports the results of the power models, energy models and librariesfor energy-efficient concurrent data structures and algorithms as available by projectmonth 30 of Work Package 2 (WP2). It reports i) the latest results of Task 2.2-2.4 onproviding programming abstractions and libraries for developing energy-efficient datastructures and algorithms and ii) the improved results of Task 2.1 on investigating andmodeling the trade-off between energy and performance of concurrent data structuresand algorithms. The work has been conducted on two main EXCESS platforms: Intelplatforms with recent Intel multicore CPUs and Movidius Myriad platforms
Scalable ray tracing with multiple GPGPUs
Rapid development in the field of computer graphics over the last 40 years has brought forth different techniques to render scenes. Rasterization is today’s most widely used technique, which in its most basic form sequentially draws thousands of polygons and applies texture on them. Ray tracing is an alternative method that mimics light transport by using rays to sample a scene in memory and render the color found at each ray’s scene intersection point. Although mainstream hardware directly supports rasterization, ray tracing would be the preferred technique due to its ability to produce highly crisp and realistic graphics, if hardware were not a limitation. Making an immediate hardware transition from rasterization to ray tracing would have a severe impact on the computer graphics industry since it would require redevelopment of existing 3D graphics-employing software, so any transition to ray tracing would be gradual. Previous efforts to perform ray tracing on mainstream rasterizing hardware platforms with a single processor have performed poorly. This thesis explores how a multiple GPGPU system can be used to render scenes via ray tracing. A ray tracing engine and API groundwork was developed using NVIDIA’s CUDA (Compute Unified Device Architecture) GPGPU programming environment and was used to evaluate performance scalability across a multi-GPGPU system. This engine supports triangle, sphere, disc, rectangle, and torus rendering. It also allows independent activation of graphics features including procedural texturing, Phong illumination, reflections, translucency, and shadows. Correctness of rendered images validates the ray traced results, and timing of rendered scenes benchmarks performance. The main test scene contains all object types, has a total of 32 Abstract objects, and applies all graphics features. Ray tracing this scene using two GPGPUs outperformed the single-GPGPU and single-CPU systems, yielding respective speedups of up to 1.8 and 31.25. The results demonstrate how much potential exists in treating a modern dual-GPU architecture as a dual-GPGPU system in order to facilitate a transition from rasterization to ray tracing
- …