9 research outputs found

    Compiling generics through user-directed type specialization

    Get PDF
    Compilation of polymorphic code through type erasure gives compact code but performance on primitive types is significantly hurt. Full specialization gives good performance, but at the cost of increased code size and compilation time. Instead we propose a mixed approach, which allows the programmer to decide what code to specialize. Our approach supports separate compilation, allows mixing of specialized and generic code, and gives very good results in practice

    Using Reified Types for Specialization

    Get PDF
    Generic code increases programmer productivity as it increases code reuse. For example, the LinkedList abstraction can be used in many contexts, from storing a list of numbers to storing representations of files on the disk. Unfortunately this comes at the expense of performance, as the generic code needs a common representation for all types. The common representation is usually a pointer to heap data. But for value types, such as integers, bytes and even value classes (see SIP-15) this leads to significant overheads, as they need to be allocated as objects on the heap and then pointed to, breaking data locality and adding an extra indirection. To overcome this performance loss, many programming languages and virtual machines perform code specialization, which creates a copy of the generic class for each value type and adapts the data representation. Scala does not require type parameters to be known at compile time, thus allowing truly-generic code to be generated. In this case, type information is not necessary and will not available in the generated bytecode. But this prohibits programmers from using specialized code from generic code, which might be desirable. Scala can overcome the loss of type information in generic classes by attaching types in so-called ClassTags (formerly known as Manifests). This information can be used to dispatch to the correct specialized class implementation, and can be made either as a compiler plugin or as a macro. Either implementation is fine, as long as the syntactic overhead is reasonable. For this project one should research what is the best implementation and prepare a set of benchmarks that clearly show the performance hit of dispatching based on the ClassTag

    Additional Material for "Unifying Data Representation Transformations"

    Get PDF
    This report shows an attempt to formalize the data representation transformation mechanism in the ``Unifying Data Representation Transformations'' paper. Since the mechanism described in the paper is targeted at the Scala programming language and the specification is written against System Fsub with local colored type inference formally reasoning about the calculus is a major undertaking. Instead, in this report we start from the simply typed lambda calculus with subtyping, natural numbers and unit. We add rewriting and adapt the calculus to propagate expected type information in a mechanism inspired from local colored type inference. Finally we show how the representation transformation mechanism (the convert phase) rewrites terms. We show that, given a series of assumptions about the inject phase, type-checking a term against the updated rules produces a correct and operationally equivalent term, with a minimum number of runtime coercions introduced for the annotations given. We finish the report by giving a series of examples which show how the code is transformed

    On Lock-Free Work-stealing Iterators for Parallel Data Structures

    Get PDF
    With the rise of multicores, there is a trend of supporting data-parallel collection operations in general purpose programming languages. These operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when applied to irregular workloads. Work-stealing is a proven technique for load balancing irregular workloads, but general purpose work-stealing also suffers abstraction penalties. We present a generic data-parallel collections design based on work-stealing for shared-memory architectures that overcomes abstraction penalties through callsite specialization of data-parallel operation instances. Moreover, we introduce \textit{work-stealing iterators} that allow fine-grained and efficient work-stealing for particular data-structures. By eliminating abstraction penalties and making work-stealing data-structure-aware we achieve up to 60x better performance compared to JVM-based approaches and 3x speedups compared to tools such as Intel TBB

    Improving the Interoperation between Generics Translations

    Get PDF
    Generics on the Java platform are compiled using the erasure transformation, which only supports by-reference values. This causes slowdowns when generics operate on primitive types, such as integers, as they have to be transformed into reference-based objects. Project Valhalla is an effort to remedy this problem by specializing classes at load-time so they can efficiently handle primitive values. In its current early prototype, the Valhalla compilation scheme limits the interaction between specialized and erased generics, thus preventing certain useful code patterns from being expressed. Scala has been using compile-time specialization for 6 years and has three generics compilation schemes working side by side. In Scala, programmers are allowed to write code that freely exercises the interaction between the different compilation schemes, at the expense of introducing subtle performance issues. Similar performance issues can affect Valhalla-enabled bytecode, whether the code was written in Java or translated from other JVM languages. In this context we explain how we help programmers avoid these performance regressions in the miniboxing transformation: (1) by issuing actionable performance advisories that steer programmers away from performance regressions and (2) by providing alternatives to the standard library constructs that use the miniboxing encoding, thus avoiding the conversion overhead

    Achieving Efficient Work-Stealing for Data-Parallel Collections

    Get PDF
    In modern programming high-level data-structures are an important foundation for most applications. With the rise of the multi-core era, there is a growing trend of supporting data-parallel collection operations in general purpose programming languages and platforms. To facilitate object-oriented reuse these operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when used in problems with irregular workloads. Work-stealing is a proven load-balancing technique when it comes to irregular workloads, but general purpose work-stealing also suffers from abstraction penalties. In this paper we present a generic design of a data-parallel collections framework based on work-stealing for shared-memory architectures. We show how abstraction penalties can be overcome through callsite specialization of data-parallel operations instances. Moreover, we show how to make work-stealing fine-grained and efficient when specialized for particular data-structures. We experimentally validate the performance of different data-structures and data-parallel operations, achieving up to 60X better performance with abstraction penalties eliminated and 3X higher speedups by specializing work-stealing compared to existing approaches

    Call Graphs for Languages with Parametric Polymorphism

    Get PDF
    The performance of contemporary object oriented languages depends on optimizations such as devirtualization, inlining, and specialization, and these in turn depend on precise call graph analysis. Existing call graph analyses do not take advantage of the information provided by the rich type systems of contemporary languages, in particular generic type arguments. Many existing approaches analyze Java bytecode, in which generic types have been erased. This paper shows that this discarded information is actually very useful as the context in a context-sensitive analysis, where it significantly improves precision and keeps the running time small. Specifically, we propose and evaluate call graph construction algorithms in which the contexts of a method are (i) the type arguments passed to its type parameters, and (ii) the static types of the arguments passed to its term parameters. The use of static types from the caller as context is effective because it allows more precise dispatch of call sites inside the callee. Our evaluation indicates that the average number of contexts required per method is small. We implement the analysis in the Dotty compiler for Scala, and evaluate it on programs that use the type-parametric Scala collections library and on the Dotty compiler itself. The context-sensitive analysis runs 1.4x faster than a context-insensitive one and discovers 20\% more monomorphic call sites at the same time. When applied to method specialization, the imprecision in a context-insensitive call graph would require the average method to be cloned 22 times, whereas the context-sensitive call graph indicates a much more practical 1.00 to 1.50 clones per method

    Late Data Layout: Unifying Data Representation Transformations

    Get PDF
    Values need to be represented differently when interacting with certain language features. For example, an integer has to take an object-based representation when interacting with erased generics, although, for performance reasons, the stack-based value representation is better. To abstract over these implementation details, some programming languages choose to expose a unified high-level concept (the integer) and let the compiler choose its exact representation and insert coercions where necessary. This pattern appears in multiple language features such as value classes, specialization and multi-stage programming: they all expose a unified concept which they later refine into multiple representations. Yet, the underlying compiler implementations typically entangle the core mechanism with assumptions about the alternative representations and their interaction with other language features. In this paper we present the Late Data Layout mechanism, a simple but versatile type-driven generalization that subsumes and improves the state-of-the-art representation transformations. In doing so, we make two key observations: (1) annotated types conveniently capture the semantics of using multiple representations and (2) local type inference can be used to consistently and optimally introduce coercions. We validated our approach by implementing three language features as Scala compiler extensions: value classes, specialization (using the miniboxing representation) and a simplified multi-stage programming mechanism
    corecore