808 research outputs found

    Adaptive sampling-based profiling techniques for optimizing the distributed JVM runtime

    Get PDF
    Extending the standard Java virtual machine (JVM) for cluster-awareness is a transparent approach to scaling out multithreaded Java applications. While this clustering solution is gaining momentum in recent years, efficient runtime support for fine-grained object sharing over the distributed JVM remains a challenge. The system efficiency is strongly connected to the global object sharing profile that determines the overall communication cost. Once the sharing or correlation between threads is known, access locality can be optimized by collocating highly correlated threads via dynamic thread migrations. Although correlation tracking techniques have been studied in some page-based sof Tware DSM systems, they would entail prohibitively high overheads and low accuracy when ported to fine-grained object-based systems. In this paper, we propose a lightweight sampling-based profiling technique for tracking inter-thread sharing. To preserve locality across migrations, we also propose a stack sampling mechanism for profiling the set of objects which are tightly coupled with a migrant thread. Sampling rates in both techniques can vary adaptively to strike a balance between preciseness and overhead. Such adaptive techniques are particularly useful for applications whose sharing patterns could change dynamically. The profiling results can be exploited for effective thread-to-core placement and dynamic load balancing in a distributed object sharing environment. We present the design and preliminary performance result of our distributed JVM with the profiling implemented. Experimental results show that the profiling is able to obtain over 95% accurate global sharing profiles at a cost of only a few percents of execution time increase for fine- to medium- grained applications. © 2010 IEEE.published_or_final_versionThe 24th IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), Atlanta, GA., 19-23 April 2010. In Proceedings of the 24th IPDPS, 2010, p. 1-1

    Interprocedural Type Specialization of JavaScript Programs Without Type Analysis

    Get PDF
    Dynamically typed programming languages such as Python and JavaScript defer type checking to run time. VM implementations can improve performance by eliminating redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. Lazy basic block versioning is a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This approach does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses. This paper extends lazy basic block versioning to propagate type information interprocedurally, across function call boundaries. Our implementation in a JavaScript JIT compiler shows that across 26 benchmarks, interprocedural basic block versioning eliminates more type tag tests on average than what is achievable with static type analysis without resorting to code transformations. On average, 94.3% of type tag tests are eliminated, yielding speedups of up to 56%. We also show that our implementation is able to outperform Truffle/JS on several benchmarks, both in terms of execution time and compilation time.Comment: 10 pages, 10 figures, submitted to CGO 201

    PerfBlower: Quickly Detecting Memory-Related Performance Problems via Amplification

    Get PDF
    Performance problems in managed languages are extremely difficult to find. Despite many efforts to find those problems, most existing work focuses on how to debug a user-provided test execution in which performance problems already manifest. It remains largely unknown how to effectively find performance bugs before software release. As a result, performance bugs often escape to production runs, hurting software reliability and user experience. This paper describes PerfBlower, a general performance testing framework that allows developers to quickly test Java programs to find memory-related performance problems. PerfBlower provides (1) a novel specification language ISL to describe a general class of performance problems that have observable symptoms; (2) an automated test oracle via emph{virtual amplification}; and (3) precise reference-path-based diagnostic information via object mirroring. Using this framework, we have amplified three different types of problems. Our experimental results demonstrate that (1) ISL is expressive enough to describe various memory-related performance problems; (2) PerfBlower successfully distinguishes executions with and without problems; 8 unknown problems are quickly discovered under small workloads; and (3) PerfBlower outperforms existing detectors and does not miss any bugs studied before in the literature

    Automatic partitioning of database applications

    Get PDF
    Database-backed applications are nearly ubiquitous in our daily lives. Applications that make many small accesses to the database create two challenges for developers: increased latency and wasted resources from numerous network round trips. A well-known technique to improve transactional database application performance is to convert part of the application into stored procedures that are executed on the database server. Unfortunately, this conversion is often difficult. In this paper we describe Pyxis, a system that takes database-backed applications and automatically partitions their code into two pieces, one of which is executed on the application server and the other on the database server. Pyxis profiles the application and server loads, statically analyzes the code's dependencies, and produces a partitioning that minimizes the number of control transfers as well as the amount of data sent during each transfer. Our experiments using TPC-C and TPC-W show that Pyxis is able to generate partitions with up to 3x reduction in latency and 1.7x improvement in throughput when compared to a traditional non-partitioned implementation and has comparable performance to that of a custom stored procedure implementation.National Science Foundation (U.S.). Graduate Research Fellowshi

    Liveness-Based Garbage Collection for Lazy Languages

    Full text link
    We consider the problem of reducing the memory required to run lazy first-order functional programs. Our approach is to analyze programs for liveness of heap-allocated data. The result of the analysis is used to preserve only live data---a subset of reachable data---during garbage collection. The result is an increase in the garbage reclaimed and a reduction in the peak memory requirement of programs. While this technique has already been shown to yield benefits for eager first-order languages, the lack of a statically determinable execution order and the presence of closures pose new challenges for lazy languages. These require changes both in the liveness analysis itself and in the design of the garbage collector. To show the effectiveness of our method, we implemented a copying collector that uses the results of the liveness analysis to preserve live objects, both evaluated (i.e., in WHNF) and closures. Our experiments confirm that for programs running with a liveness-based garbage collector, there is a significant decrease in peak memory requirements. In addition, a sizable reduction in the number of collections ensures that in spite of using a more complex garbage collector, the execution times of programs running with liveness and reachability-based collectors remain comparable
    • …
    corecore