169 research outputs found
July 11th, 2017
Performance clearly matters to users. The most common software update on the AppStore *by far* is "Bug fixes and performance enhancements." Now that Moore's Law Free Lunch has ended, programmers have to work hard to get high performance for their applications. But why is performance so hard to deliver? I will first explain why our current approaches to evaluating and optimizing performance don't work, especially on modern hardware and for modern applications. I will then present two systems that address these challenges. Stabilizer is a tool that enables statistically sound performance evaluation, making it possible to understand the impact of optimizations and conclude things like the fact that the -O2 and -O3 optimization levels are indistinguishable from noise (unfortunately true).
Since compiler optimizations have largely run out of steam, we need better profiling support, especially for modern concurrent, multi-threaded applications. Coz is a novel "causal profiler" that lets programmers optimize for throughput or latency, and which pinpoints and accurately predicts the impact of optimizations. Coz's approach unlocks numerous previously unknown optimization opportunities. Guided by Coz, we improved the performance of Memcached by 9%, SQLite by 25%, and accelerated six Parsec applications by as much as 68%; in most cases, these optimizations involved modifying under 10 lines of code. This talk is based on work with Charlie Curtsinger published at ASPLOS 2013 (Stabilizer) and SOSP 2015 (Coz), which received a Best Paper Award and was selected as a CACM Research Highlight
Mesh: automatically compacting memory for C/C++ applications
Programs written in C and C++ — and languages implemented in C, like Python and
Ruby — can suffer from serious memory fragmentation, leading to low utilization of
memory, degraded performance, and application failure due to memory exhaustion.
This talk introduces Mesh, a plug-in replacement for malloc that, for the first time, eliminates
fragmentation in unmodified applications through compaction. A key challenge
is that, unlike in garbage-collected environments, the addresses of allocated objects in
C/C++ are directly exposed to programmers, and applications may do things like stash
addresses in integers, and store flags in the low bits of aligned addresses. This hostile
environment makes it impossible to safely relocate objects, as the runtime cannot precisely
locate and update pointers
30 Years of Software Refactoring Research:A Systematic Literature Review
Due to the growing complexity of software systems, there has been a dramatic
increase and industry demand for tools and techniques on software refactoring
in the last ten years, defined traditionally as a set of program
transformations intended to improve the system design while preserving the
behavior. Refactoring studies are expanded beyond code-level restructuring to
be applied at different levels (architecture, model, requirements, etc.),
adopted in many domains beyond the object-oriented paradigm (cloud computing,
mobile, web, etc.), used in industrial settings and considered objectives
beyond improving the design to include other non-functional requirements (e.g.,
improve performance, security, etc.). Thus, challenges to be addressed by
refactoring work are, nowadays, beyond code transformation to include, but not
limited to, scheduling the opportune time to carry refactoring, recommendations
of specific refactoring activities, detection of refactoring opportunities, and
testing the correctness of applied refactorings. Therefore, the refactoring
research efforts are fragmented over several research communities, various
domains, and objectives. To structure the field and existing research results,
this paper provides a systematic literature review and analyzes the results of
3183 research papers on refactoring covering the last three decades to offer
the most scalable and comprehensive literature review of existing refactoring
research studies. Based on this survey, we created a taxonomy to classify the
existing research, identified research trends, and highlighted gaps in the
literature and avenues for further research.Comment: 23 page
30 Years of Software Refactoring Research: A Systematic Literature Review
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155872/4/30YRefactoring.pd
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
We develop a dynamic dictionary data structure for the GPU, supporting fast
insertions and deletions, based on the Log Structured Merge tree (LSM). Our
implementation on an NVIDIA K40c GPU has an average update (insertion or
deletion) rate of 225 M elements/s, 13.5x faster than merging items into a
sorted array. The GPU LSM supports the retrieval operations of lookup, count,
and range query operations with an average rate of 75 M, 32 M and 23 M
queries/s respectively. The trade-off for the dynamic updates is that the
sorted array is almost twice as fast on retrievals. We believe that our GPU LSM
is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International
Parallel and Distributed Processing Symposium (IPDPS'18
Washington University Record, September 22, 2000
https://digitalcommons.wustl.edu/record/1872/thumbnail.jp
- …