11,954 research outputs found
Making extreme computations possible with virtual machines
State-of-the-art algorithms generate scattering amplitudes for high-energy
physics at leading order for high-multiplicity processes as compiled code (in
Fortran, C or C++). For complicated processes the size of these libraries can
become tremendous (many GiB). We show that amplitudes can be translated to
byte-code instructions, which even reduce the size by one order of magnitude.
The byte-code is interpreted by a Virtual Machine with runtimes comparable to
compiled code and a better scaling with additional legs. We study the
properties of this algorithm, as an extension of the Optimizing Matrix Element
Generator (O'Mega). The bytecode matrix elements are available as alternative
input for the event generator WHIZARD. The bytecode interpreter can be
implemented very compactly, which will help with a future implementation on
massively parallel GPUs.Comment: 5 pages, 2 figures. arXiv admin note: substantial text overlap with
arXiv:1411.383
Performance comparison between Java and JNI for optimal implementation of computational micro-kernels
General purpose CPUs used in high performance computing (HPC) support a
vector instruction set and an out-of-order engine dedicated to increase the
instruction level parallelism. Hence, related optimizations are currently
critical to improve the performance of applications requiring numerical
computation. Moreover, the use of a Java run-time environment such as the
HotSpot Java Virtual Machine (JVM) in high performance computing is a promising
alternative. It benefits from its programming flexibility, productivity and the
performance is ensured by the Just-In-Time (JIT) compiler. Though, the JIT
compiler suffers from two main drawbacks. First, the JIT is a black box for
developers. We have no control over the generated code nor any feedback from
its optimization phases like vectorization. Secondly, the time constraint
narrows down the degree of optimization compared to static compilers like GCC
or LLVM. So, it is compelling to use statically compiled code since it benefits
from additional optimization reducing performance bottlenecks. Java enables to
call native code from dynamic libraries through the Java Native Interface
(JNI). Nevertheless, JNI methods are not inlined and require an additional cost
to be invoked compared to Java ones. Therefore, to benefit from better static
optimization, this call overhead must be leveraged by the amount of computation
performed at each JNI invocation. In this paper we tackle this problem and we
propose to do this analysis for a set of micro-kernels. Our goal is to select
the most efficient implementation considering the amount of computation defined
by the calling context. We also investigate the impact on performance of
several different optimization schemes which are vectorization, out-of-order
optimization, data alignment, method inlining and the use of native memory for
JNI methods.Comment: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347
A Compiler and Runtime Infrastructure for Automatic Program Distribution
This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction.
We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system
Recommended from our members
Automatically bridging the semantic gap in machine introspection
Disclosed are various embodiments that facilitate automatically bridging the semantic gap in machine introspection. It may be determined that a program executed by a first virtual machine is requested to introspect a second virtual machine. A system call execution context of the program may be determined in response to determining that the program is requested to introspect the second virtual machine. Redirectable data in a memory of the second virtual machine may be identified based at least in part on the system call execution context of the program. The program may be configured to access the redirectable data. In various embodiments, the program may be able to modify the redirectable data, thereby facilitating configuration, reconfiguration, and recovery operations to be performed on the second virtual machine from within the first virtual machine.Board of Regents, University of Texas Syste
Enhancing R with Advanced Compilation Tools and Methods
I describe an approach to compiling common idioms in R code directly to
native machine code and illustrate it with several examples. Not only can this
yield significant performance gains, but it allows us to use new approaches to
computing in R. Importantly, the compilation requires no changes to R itself,
but is done entirely via R packages. This allows others to experiment with
different compilation strategies and even to define new domain-specific
languages within R. We use the Low-Level Virtual Machine (LLVM) compiler
toolkit to create the native code and perform sophisticated optimizations on
the code. By adopting this widely used software within R, we leverage its
ability to generate code for different platforms such as CPUs and GPUs, and
will continue to benefit from its ongoing development. This approach
potentially allows us to develop high-level R code that is also fast, that can
be compiled to work with different data representations and sources, and that
could even be run outside of R. The approach aims to both provide a compiler
for a limited subset of the R language and also to enable R programmers to
write other compilers. This is another approach to help us write high-level
descriptions of what we want to compute, not how.Comment: Published in at http://dx.doi.org/10.1214/13-STS462 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
MPF: A portable message passing facility for shared memory multiprocessors
The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations
Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU
In this paper, we present that security threats coming with existing GPU
memory management strategy are overlooked, which opens a back door for
adversaries to freely break the memory isolation: they enable adversaries
without any privilege in a computer to recover the raw memory data left by
previous processes directly. More importantly, such attacks can work on not
only normal multi-user operating systems, but also cloud computing platforms.
To demonstrate the seriousness of such attacks, we recovered original data
directly from GPU memory residues left by exited commodity applications,
including Google Chrome, Adobe Reader, GIMP, Matlab. The results show that,
because of the vulnerable memory management strategy, commodity applications in
our experiments are all affected
- …