33 research outputs found
Iso-energy-efficiency: An approach to power-constrained parallel computation
Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications
Energy efficiency is a major concern in modern high-performance computing system design. In the past few years, there has been mounting evidence that power usage limits system scale and computing density, and thus, ultimately system performance. However, despite the impact of power and energy on the computer systems community, few studies provide insight to where and how power is consumed on high-performance systems and applications. In previous work, we designed a framework called PowerPack that was the first tool to isolate the power consumption of devices including disks, memory, NICs, and processors in a high-performance cluster and correlate these measurements to application functions. In this work, we extend our framework to support systems with multicore, multiprocessor-based nodes, and then provide in-depth analyses of the energy consumption of parallel applications on clusters of these systems. These analyses include the impacts of chip multiprocessing on power and energy efficiency, and its interaction with application executions. In addition, we use PowerPack to study the power dynamics and energy efficiencies of dynamic voltage and frequency scaling (DVFS) techniques on clusters. Our experiments reveal conclusively how intelligent DVFS scheduling can enhance system energy efficiency while maintaining performance
MalFox: Camouflaged Adversarial Malware Example Generation Based on Conv-GANs Against Black-Box Detectors
Deep learning is a thriving field currently stuffed with many practical
applications and active research topics. It allows computers to learn from
experience and to understand the world in terms of a hierarchy of concepts,
with each being defined through its relations to simpler concepts. Relying on
the strong capabilities of deep learning, we propose a convolutional generative
adversarial network-based (Conv-GAN) framework titled MalFox, targeting
adversarial malware example generation against third-party black-box malware
detectors. Motivated by the rival game between malware authors and malware
detectors, MalFox adopts a confrontational approach to produce perturbation
paths, with each formed by up to three methods (namely Obfusmal, Stealmal, and
Hollowmal) to generate adversarial malware examples. To demonstrate the
effectiveness of MalFox, we collect a large dataset consisting of both malware
and benignware programs, and investigate the performance of MalFox in terms of
accuracy, detection rate, and evasive rate of the generated adversarial malware
examples. Our evaluation indicates that the accuracy can be as high as 99.0%
which significantly outperforms the other 12 well-known learning models.
Furthermore, the detection rate is dramatically decreased by 56.8% on average,
and the average evasive rate is noticeably improved by up to 56.2%
MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems
Sparse linear algebra kernels play a critical role in numerous applications,
covering from exascale scientific simulation to large-scale data analytics.
Offloading linear algebra kernels on one GPU will no longer be viable in these
applications, simply because the rapidly growing data volume may exceed the
memory capacity and computing power of a single GPU. Multi-GPU systems nowadays
being ubiquitous in supercomputers and data-centers present great potentials in
scaling up large sparse linear algebra kernels. In this work, we design a novel
sparse matrix representation framework for multi-GPU systems called MSREP, to
scale sparse linear algebra operations based on our augmented sparse matrix
formats in a balanced pattern. Different from dense operations, sparsity
significantly intensifies the difficulty of distributing the computation
workload among multiple GPUs in a balanced manner. We enhance three mainstream
sparse data formats -- CSR, CSC, and COO, to enable fine-grained data
distribution. We take sparse matrix-vector multiplication (SpMV) as an example
to demonstrate the efficiency of our MSREP framework. In addition, MSREP can be
easily extended to support other sparse linear algebra kernels based on the
three fundamental formats (i.e., CSR, CSC and COO)