6 research outputs found

    Instant Pickles: Generating Object-Oriented Pickler Combinators for Fast and Extensible Serialization

    Get PDF
    As more applications migrate to the cloud, and as “big data” edges into even more production environments, the performance and simplicity of exchanging data between compute nodes/devices is increasing in importance. An issue central to distributed programming, yet often under-considered, is serialization or pickling, i.e., persisting runtime objects by converting them into a binary or text representation. Pickler combinators are a popular approach from functional programming; their composability alleviates some of the tedium of writing pickling code by hand, but they don’t translate well to object-oriented programming due to qualities like open class hierarchies and subtyping polymorphism. Furthermore, both functional pickler combinators and popular, Java-based serialization frameworks tend to be tied to a specific pickle format, leaving programmers with no choice of how their data is persisted. In this paper, we present object-oriented pickler combinators and a framework for generating them at compile-time, called scala/pickling, designed to be the default serialization mechanism of the Scala programming language. The static generation of OO picklers enables significant performance improvements, outperforming Java and Kryo in most of our benchmarks. In addition to high performance and the need for little to no boilerplate, our framework is extensible: using the type class pattern, users can provide both (1) custom, easily interchangeable pickle formats and (2) custom picklers, to override the default behavior of the pickling framework. In benchmarks, we compare scala/pickling with other popular industrial frameworks, and present results on time, memory usage, and size when pickling/unpickling a number of data types used in real-world, large-scale distributed applications and frameworks

    A Programming Model and Foundation for Lineage-Based Distributed Computation

    Get PDF
    The most successful systems for "big data'' processing have all adopted functional APIs. We present a new programming model we call function passing designed to provide a more principled substrate, or middleware, upon which to build data-centric distributed systems like Spark. A key idea is to build up a persistent functional data structure representing transformations on distributed immutable data by passing well-typed serializable functions over the wire and applying them to this distributed data. Thus, the function passing model can be thought of as a persistent functional data structure that is distributed, where transformations performed on distributed data are stored in its nodes rather than the distributed data itself. One advantage of this model is that failure recovery is simplified by design--data can be recovered by replaying function applications atop immutable data loaded from stable storage. Deferred evaluation is also central to our model; by incorporating deferred evaluation into our design only at the point of initiating network communication, the function passing model remains easy to reason about while remaining efficient in time and memory. Moreover, we provide a complete formalization of the programming model in order to study the foundations of lineage-based distributed computation. In particular, we develop a theory of safe, mobile lineages based on a subject reduction theorem for a typed core language. Furthermore, we formalize a progress theorem which guarantees the finite materialization of remote, lineage-based data. Thus, the formal model may serve as a basis for further developments of the theory of data-centric distributed programming, including aspects such as fault tolerance. We provide an open-source implementation of our model in and for the Scala programming language, along with a case study of several example frameworks and end-user programs written atop of this model

    Abstraction for web programming

    Get PDF
    This thesis considers several instances of abstraction that arose in the design and implementation of the web programming language Links. The first concerns user interfaces, specified using HTML forms. We wish to construct forms from existing form fragments without introducing dependencies on the implementation details of those fragments. Surprisingly, many existing web systems do not support this simple scenario. We present a library which captures the essence of form abstraction, and extend it with more practical facilities, such as validation of the HTML a program produces and of the input a user submits. An important part of our library is a simple semantics, given as the composition of three primitive “idioms”, an interface to computation introduced by McBride and Paterson. In order to justify this approach we present a comparison of idioms with the related notions of monads and arrows, refining the informal claims in the literature. Our library forms part of the Links framework for stateless web interactions. We describe a related aspect of this system, a preprocessor that derives generic instances of functions, which we use to serialise server state between client requests. The abstraction in this case involves the shape of datatypes: the serialisation operation is specified independently of the particular types involved. Our final instance of abstraction involves abstract types. Functional programming languages typically offer one of two styles of abstract type: the abstraction boundary may be drawn using a private data constructor, or using a type signature. We show that there is a pair of semantics-preserving translations between these two styles. In the light of this, we revisit the decision of the Haskell designers to offer the constructor style, and define a library that supports signature-style definitions in Haskell by translation into the constructor style

    Language Support for Distributed Functional Programming

    Get PDF
    Software development has taken a fundamental turn. Software today has gone from simple, closed programs running on a single machine, to massively open programs, patching together user experiences byway of responses received via hundreds of network requests spanning multiple machines. At the same time, as data continues to stockpile, systems for big data analytics are on the rise. Yet despite this trend towards distributing computation, issues at the level of the language and runtime abound. Serialization is still a costly runtime affair, crashing running systems and confounding developers. Function closures are being added to APIs for big data processing for use by end-users without reliably being able to transmit them over the network. And much of the frameworks developed for handling multiple concurrent requests byway of asynchronous programming facilities rely on blocking threads, causing serious scalability issues. This thesis describes a number of extensions and libraries for the Scala programming language that aim to address these issues and to provide a more reliable foundation on which to build distributed systems. This thesis presents a new approach to serialization called pickling based on the idea of generating and composing functional pickler combinators statically. The approach shifts the burden of serialization to compile time as much as possible, enabling users to catch serialization errors at compile time rather than at runtime. Further, by virtue of serialization code being generated at compile time, our framework is shown to be significantly more performant than other state-of-the-art serialization frameworks. We also generalize our technique for generating serialization code to generic functions other than pickling. Second, in light of the trend of distributed data-parallel frameworks being designed around functional patterns where closures are transmitted across cluster nodes to large-scale persistent datasets, this thesis introduces a new closure-like abstraction and type system, called spores, that can guarantee closures to be serializable, thread-safe, or even have custom user-defined properties. Crucially, our system is based on the principle of encoding type information corresponding to captured variables in the type of a spore. We prove our type system sound, implement our approach for Scala, evaluate its practicality through a small empirical study, and show the power of these guarantees through a case analysis of real-world distributed and concurrent frameworks that this safe foundation for closures facilitates. Finally, we bring together the above building blocks, pickling and spores, to form the basis of a new programming model called function-passing. Function-passing is based on the idea of a distributed persistent data structure which stores in its nodes transformations to data rather than the distributed data itself, simplifying fault recovery by design. Lazy evaluation is also central to our model; by incorporating laziness into our design only at the point of initiating network communication, our model remains easy to reason about while remaining efficient in time and memory. We formalize our programming model in the form of a small-step operational semantics which includes a precise specification of the semantics of functional fault recovery, and we provide an open-source implementation of our model in and for Scala

    Type-specialized Serialization with Sharing

    No full text
    In this paper we present an implementation of a Standard ML combinator library for serializing and deserializing data structures. The combinator library supports serialization of cyclic data structures and sharing. It generates compact serialized values, both due to sharing, but also due to type specialization. The library is type safe in the sense that a type specialized serializer can be applied only to values of the specialized type. In the paper, we demonstrate how programmer control provided by the combinator library can lead to efficient serializers compared to generic serializers supported by traditional language implementations.

    ABSTRACT Type-Specialized Serialization with Sharing

    No full text
    In this paper we present an implementation of a Standard ML combinator library for serializing and deserializing data structures. The combinator library supports serialization of cyclic data structures and sharing. It generates compact serialized values, both due to sharing, but also due to type specialization. The library is type safe in the sense that a type specialized serializer can be applied only to values of the specialized type. In the paper, we demonstrate how programmer control provided by the combinator library can lead to efficient serializers compared to how values are serialized with generic serializers supported by traditional language implementations
    corecore