1,529 research outputs found
Performance comparison between Java and JNI for optimal implementation of computational micro-kernels
General purpose CPUs used in high performance computing (HPC) support a
vector instruction set and an out-of-order engine dedicated to increase the
instruction level parallelism. Hence, related optimizations are currently
critical to improve the performance of applications requiring numerical
computation. Moreover, the use of a Java run-time environment such as the
HotSpot Java Virtual Machine (JVM) in high performance computing is a promising
alternative. It benefits from its programming flexibility, productivity and the
performance is ensured by the Just-In-Time (JIT) compiler. Though, the JIT
compiler suffers from two main drawbacks. First, the JIT is a black box for
developers. We have no control over the generated code nor any feedback from
its optimization phases like vectorization. Secondly, the time constraint
narrows down the degree of optimization compared to static compilers like GCC
or LLVM. So, it is compelling to use statically compiled code since it benefits
from additional optimization reducing performance bottlenecks. Java enables to
call native code from dynamic libraries through the Java Native Interface
(JNI). Nevertheless, JNI methods are not inlined and require an additional cost
to be invoked compared to Java ones. Therefore, to benefit from better static
optimization, this call overhead must be leveraged by the amount of computation
performed at each JNI invocation. In this paper we tackle this problem and we
propose to do this analysis for a set of micro-kernels. Our goal is to select
the most efficient implementation considering the amount of computation defined
by the calling context. We also investigate the impact on performance of
several different optimization schemes which are vectorization, out-of-order
optimization, data alignment, method inlining and the use of native memory for
JNI methods.Comment: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347
Julia: A Fresh Approach to Numerical Computing
Bridging cultures that have often been distant, Julia combines expertise from
the diverse fields of computer science and computational science to create a
new approach to numerical computing. Julia is designed to be easy and fast.
Julia questions notions generally held as "laws of nature" by practitioners of
numerical computing:
1. High-level dynamic programs have to be slow.
2. One must prototype in one language and then rewrite in another language
for speed or deployment, and
3. There are parts of a system for the programmer, and other parts best left
untouched as they are built by the experts.
We introduce the Julia programming language and its design --- a dance
between specialization and abstraction. Specialization allows for custom
treatment. Multiple dispatch, a technique from computer science, picks the
right algorithm for the right circumstance. Abstraction, what good computation
is really about, recognizes what remains the same after differences are
stripped away. Abstractions in mathematics are captured as code through another
technique from computer science, generic programming.
Julia shows that one can have machine performance without sacrificing human
convenience.Comment: 37 page
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
The Scalable Brain Atlas: instant web-based access to public brain atlases and related content
The Scalable Brain Atlas (SBA) is a collection of web services that provide
unified access to a large collection of brain atlas templates for different
species. Its main component is an atlas viewer that displays brain atlas data
as a stack of slices in which stereotaxic coordinates and brain regions can be
selected. These are subsequently used to launch web queries to resources that
require coordinates or region names as input. It supports plugins which run
inside the viewer and respond when a new slice, coordinate or region is
selected. It contains 20 atlas templates in six species, and plugins to compute
coordinate transformations, display anatomical connectivity and fiducial
points, and retrieve properties, descriptions, definitions and 3d
reconstructions of brain regions. The ambition of SBA is to provide a unified
representation of all publicly available brain atlases directly in the web
browser, while remaining a responsive and light weight resource that
specializes in atlas comparisons, searches, coordinate transformations and
interactive displays.Comment: Rolf K\"otter sadly passed away on June 9th, 2010. He co-initiated
this project and played a crucial role in the design and quality assurance of
the Scalable Brain Atla
Personalized Fuzzy Text Search Using Interest Prediction and Word Vectorization
In this paper we study the personalized text search problem. The keyword
based search method in conventional algorithms has a low efficiency in
understanding users' intention since the semantic meaning, user profile, user
interests are not always considered. Firstly, we propose a novel text search
algorithm using a inverse filtering mechanism that is very efficient for label
based item search. Secondly, we adopt the Bayesian network to implement the
user interest prediction for an improved personalized search. According to user
input, it searches the related items using keyword information, predicted user
interest. Thirdly, the word vectorization is used to discover potential targets
according to the semantic meaning. Experimental results show that the proposed
search engine has an improved efficiency and accuracy and it can operate on
embedded devices with very limited computational resources
Julia Programming Language Benchmark Using a Flight Simulation
Julias goal to provide scripting language ease-of-coding with compiled language speed is explored. The runtime speed of the relatively new Julia programming language is assessed against other commonly used languages including Python, Java, and C++. An industry-standard missile and rocket simulation, coded in multiple languages, was used as a test bench for runtime speed. All language versions of the simulation, including Julia, were coded to a highly-developed object-oriented simulation architecture tailored specifically for time-domain flight simulation. A speed-of-coding second-dimension is plotted against runtime for each language to portray a space that characterizes Julias scripting language efficiencies in the context of the other languages. With caveats, Julia runtime speed was found to be in the class of compiled or semi-compiled languages. However, some factors that affect runtime speed at the cost of ease-of-coding are shown. Julias built-in functionality for multi-core processing is briefly examined as a means for obtaining even faster runtime speed. The major contribution of this research to the extensive language benchmarking body-of-work is comparing Julia to other mainstream languages using a complex flight simulation as opposed to benchmarking with single algorithms
- …