362 research outputs found

    DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

    Get PDF
    Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring—become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results

    Stream Fusion, to Completeness

    Full text link
    Stream processing is mainstream (again): Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell. Yet expressivity and performance are still lacking. For instance, the popular, well-optimized Java 8 streams do not support the zip operator and are still an order of magnitude slower than hand-written loops. We present the first approach that represents the full generality of stream processing and eliminates overheads, via the use of staging. It is based on an unusually rich semantic model of stream interaction. We support any combination of zipping, nesting (or flat-mapping), sub-ranging, filtering, mapping-of finite or infinite streams. Our model captures idiosyncrasies that a programmer uses in optimizing stream pipelines, such as rate differences and the choice of a "for" vs. "while" loops. Our approach delivers hand-written-like code, but automatically. It explicitly avoids the reliance on black-box optimizers and sufficiently-smart compilers, offering highest, guaranteed and portable performance. Our approach relies on high-level concepts that are then readily mapped into an implementation. Accordingly, we have two distinct implementations: an OCaml stream library, staged via MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we derive libraries richer and simultaneously many tens of times faster than past work. We greatly exceed in performance the standard stream libraries available in Java, Scala and OCaml, including the well-optimized Java 8 streams

    MaxSAT Evaluation 2018 : Solver and Benchmark Descriptions

    Get PDF
    Non peer reviewe

    Enjoy the Joy of Copulas: With a Package copula

    Get PDF
    Copulas have become a popular tool in multivariate modeling successfully applied in many fields. A good open-source implementation of copulas is much needed for more practitioners to enjoy the joy of copulas. This article presents the design, features, and some implementation details of the R package copula. The package provides a carefully designed and easily extensible platform for multivariate modeling with copulas in R. S4 classes for most frequently used elliptical copulas and Archimedean copulas are implemented, with methods for density/distribution evaluation, random number generation, and graphical display. Fitting copula-based models with maximum likelihood method is provided as template examples. With the classes and methods in the package, the package can be easily extended by user-defined copulas and margins to solve problems

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Industrial applications of ASF+SDF

    Get PDF
    In recent years, a number of Dutch companies have used the algebraic specification formalism ASF+SDF. Bank MeesPierson has specified a language for describing interest rate products, their translation into COBOL, and their generation from interactive questionnaires. A consultancy company has specified a language to represent the company's object-oriented models, and the compilation of this language into Access. Bank ABN-AMRO has started investigating the use of algebraic specifications for renovating legacy COBOL systems. We discuss the implications of such projects for teaching algebraic specifications and software engineering, and the role students have been playing in these projects

    Type Safe Extensible Programming

    Full text link
    Software products evolve over time. Sometimes they evolve by adding new features, and sometimes by either fixing bugs or replacing outdated implementations with new ones. When software engineers fail to anticipate such evolution during development, they will eventually be forced to re-architect or re-build from scratch. Therefore, it has been common practice to prepare for changes so that software products are extensible over their lifetimes. However, making software extensible is challenging because it is difficult to anticipate successive changes and to provide adequate abstraction mechanisms over potential changes. Such extensibility mechanisms, furthermore, should not compromise any existing functionality during extension. Software engineers would benefit from a tool that provides a way to add extensions in a reliable way. It is natural to expect programming languages to serve this role. Extensible programming is one effort to address these issues. In this thesis, we present type safe extensible programming using the MLPolyR language. MLPolyR is an ML-like functional language whose type system provides type-safe extensibility mechanisms at several levels. After presenting the language, we will show how these extensibility mechanisms can be put to good use in the context of product line engineering. Product line engineering is an emerging software engineering paradigm that aims to manage variations, which originate from successive changes in software.Comment: PhD Thesis submitted October, 200

    Auto-Pipe and the X Language: A Toolset and Language for the Simulation, Analysis, and Synthesis of Heterogeneous Pipelined Architectures, Master\u27s Thesis, August 2006

    Get PDF
    Pipelining an algorithmis a popularmethod of increasing the performance of many computation-intensive applications. Often, one wants to form pipelines composed mostly of commonly used simple building blocks such as DSP components, simple math operations, encryption, or pattern matching stages. Additionally, one may desire to map these processing tasks to different computational resources based on their relative performance attributes (e.g., DSP operations on an FPGA). Auto-Pipe is composed of the X Language, a flexible interface language that aids the description of complex dataflow topologies (including pipelines); X-Com, a compiler for the X Language; X-Sim, a tool for modeling pipelined architectures based on measured, simulated, or derived task and communications behavior; X-Opt, a tool to optimize X applications under various metrics; and X-Dep, a tool for the automatic deployment of X-Com- or X-Sim-generated applications to real or simulated devices. This thesis presents an overview of the Auto-Pipe system, the design and use of the X Language, and an implementation of X-Com. Applications developed using the X Language are presented which demonstrate the effectiveness of describing algorithms using X, and the effectiveness of the Auto-Pipe development flow in analyzing and improving the performance of an application
    • …
    corecore