2,099 research outputs found
Emulating and evaluating hybrid memory for managed languages on NUMA hardware
Non-volatile memory (NVM) has the potential to become a mainstream memory technology and challenge DRAM. Researchers evaluating the speed, endurance, and abstractions of hybrid memories with DRAM and NVM typically use simulation, making it easy to evaluate the impact of different hardware technologies and parameters. Simulation is, however, extremely slow, limiting the applications and datasets in the evaluation. Simulation also precludes critical workloads, especially those written in managed languages such as Java and C#. Good methodology embraces a variety of techniques for evaluating new ideas, expanding the experimental scope, and uncovering new insights.
This paper introduces a platform to emulate hybrid memory for managed languages using commodity NUMA servers. Emulation complements simulation but offers richer software experimentation. We use a thread-local socket to emulate DRAM and a remote socket to emulate NVM. We use standard C library routines to allocate heap memory on the DRAM and NVM sockets for use with explicit memory management or garbage collection. We evaluate the emulator using various configurations of write-rationing garbage collectors that improve NVM lifetimes by limiting writes to NVM, using 15 applications and various datasets and workload configurations. We show emulation and simulation confirm each other's trends in terms of writes to NVM for different software configurations, increasing our confidence in predicting future system effects. Emulation brings novel insights, such as the non-linear effects of multi-programmed workloads on NVM writes, and that Java applications write significantly more than their C++ equivalents. We make our software infrastructure publicly available to advance the evaluation of novel memory management schemes on hybrid memories
Towards co-designed optimizations in parallel frameworks: A MapReduce case study
The explosion of Big Data was followed by the proliferation of numerous
complex parallel software stacks whose aim is to tackle the challenges of data
deluge. A drawback of a such multi-layered hierarchical deployment is the
inability to maintain and delegate vital semantic information between layers in
the stack. Software abstractions increase the semantic distance between an
application and its generated code. However, parallel software frameworks
contain inherent semantic information that general purpose compilers are not
designed to exploit.
This paper presents a case study demonstrating how the specific semantic
information of the MapReduce paradigm can be exploited on multicore
architectures. MR4J has been implemented in Java and evaluated against
hand-optimized C and C++ equivalents. The initial observed results led to the
design of a semantically aware optimizer that runs automatically without
requiring modification to application code.
The optimizer is able to speedup the execution time of MR4J by up to 2.0x.
The introduced optimization not only improves the performance of the generated
code, during the map phase, but also reduces the pressure on the garbage
collector. This demonstrates how semantic information can be harnessed without
sacrificing sound software engineering practices when using parallel software
frameworks.Comment: 8 page
JVM-hosted languages: They talk the talk, but do they walk the walk?
The rapid adoption of non-Java JVM languages is impressive: major international corporations are staking critical parts of their software infrastructure on components built from languages such as
Scala and Clojure. However with the possible exception of Scala,
there has been little academic consideration and characterization
of these languages to date. In this paper, we examine four nonJava JVM languages and use exploratory data analysis techniques
to investigate differences in their dynamic behavior compared to
Java. We analyse a variety of programs and levels of behavior to
draw distinctions between the different programming languages.
We brieļ¬y discuss the implications of our ļ¬ndings for improving
the performance of JIT compilation and garbage collection on the
JVM platform
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models
The upcoming many-core architectures require software developers to exploit
concurrency to utilize available computational power. Today's high-level
language virtual machines (VMs), which are a cornerstone of software
development, do not provide sufficient abstraction for concurrency concepts. We
analyze concrete and abstract concurrency models and identify the challenges
they impose for VMs. To provide sufficient concurrency support in VMs, we
propose to integrate concurrency operations into VM instruction sets.
Since there will always be VMs optimized for special purposes, our goal is to
develop a methodology to design instruction sets with concurrency support.
Therefore, we also propose a list of trade-offs that have to be investigated to
advise the design of such instruction sets.
As a first experiment, we implemented one instruction set extension for
shared memory and one for non-shared memory concurrency. From our experimental
results, we derived a list of requirements for a full-grown experimental
environment for further research
A Closer Look at Fedora's Ingest Performance
4th International Conference on Open RepositoriesThis presentation was part of the session : Fedora User Group PresentationsDate: 2009-05-21 10:30 AM ā 12:00 PMIt is of paramount importance for large-scale applications that Fedora can handle huge amounts of data efficiently. While Fedora is generally known to be stable and reliable, there appears to be a lack of data and experience regarding large-scale installations and the performance implications thereof. FIZ Karlsruhe is currently working on several projects with large-scale Fedora repositories holding several million complex objects. We conducted extensive performance and scalability tests with the current Fedora software (mostly version 3.0), focusing on ingest operations. Our goal was to prove that Fedora actually scales well enough for our use cases. Our test runs provided us with data which helped us identifying limits and constraints, and devising some optimization recommendations
- ā¦