362 research outputs found
DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing
(HPC), and machine learning (ML) training and scoring—become
increasingly common in practice. Interestingly, systems of these
areas share many compilation and runtime techniques, and the
used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource
management, data formats and representations, as well as execution
strategies differ substantially. DAPHNE is an open and extensible
system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for
increasing productivity and eliminating unnecessary overheads. In
this paper, we make a case for IDA pipelines, describe the overall
DAPHNE system architecture, its key components, and the design
of a vectorized execution engine for computational storage, HW
accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas,
DuckDB, and TensorFlow show promising results
Stream Fusion, to Completeness
Stream processing is mainstream (again): Widely-used stream libraries are now
available for virtually all modern OO and functional languages, from Java to C#
to Scala to OCaml to Haskell. Yet expressivity and performance are still
lacking. For instance, the popular, well-optimized Java 8 streams do not
support the zip operator and are still an order of magnitude slower than
hand-written loops. We present the first approach that represents the full
generality of stream processing and eliminates overheads, via the use of
staging. It is based on an unusually rich semantic model of stream interaction.
We support any combination of zipping, nesting (or flat-mapping), sub-ranging,
filtering, mapping-of finite or infinite streams. Our model captures
idiosyncrasies that a programmer uses in optimizing stream pipelines, such as
rate differences and the choice of a "for" vs. "while" loops. Our approach
delivers hand-written-like code, but automatically. It explicitly avoids the
reliance on black-box optimizers and sufficiently-smart compilers, offering
highest, guaranteed and portable performance. Our approach relies on high-level
concepts that are then readily mapped into an implementation. Accordingly, we
have two distinct implementations: an OCaml stream library, staged via
MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we
derive libraries richer and simultaneously many tens of times faster than past
work. We greatly exceed in performance the standard stream libraries available
in Java, Scala and OCaml, including the well-optimized Java 8 streams
MaxSAT Evaluation 2018 : Solver and Benchmark Descriptions
Non peer reviewe
Enjoy the Joy of Copulas: With a Package copula
Copulas have become a popular tool in multivariate modeling successfully applied in many fields. A good open-source implementation of copulas is much needed for more practitioners to enjoy the joy of copulas. This article presents the design, features, and some implementation details of the R package copula. The package provides a carefully designed and easily extensible platform for multivariate modeling with copulas in R. S4 classes for most frequently used elliptical copulas and Archimedean copulas are implemented, with methods for density/distribution evaluation, random number generation, and graphical display. Fitting copula-based models with maximum likelihood method is provided as template examples. With the classes and methods in the package, the package can be easily extended by user-defined copulas and margins to solve problems
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
Industrial applications of ASF+SDF
In recent years, a number of Dutch companies have used the algebraic specification formalism ASF+SDF. Bank MeesPierson has specified a language for describing interest rate products, their translation into COBOL, and their generation from interactive questionnaires. A consultancy company has specified a language to represent the company's object-oriented models, and the compilation of this language into Access. Bank ABN-AMRO has started investigating the use of algebraic specifications for renovating legacy COBOL systems. We discuss the implications of such projects for teaching algebraic specifications and software engineering, and the role students have been playing in these projects
Type Safe Extensible Programming
Software products evolve over time. Sometimes they evolve by adding new
features, and sometimes by either fixing bugs or replacing outdated
implementations with new ones. When software engineers fail to anticipate such
evolution during development, they will eventually be forced to re-architect or
re-build from scratch. Therefore, it has been common practice to prepare for
changes so that software products are extensible over their lifetimes. However,
making software extensible is challenging because it is difficult to anticipate
successive changes and to provide adequate abstraction mechanisms over
potential changes. Such extensibility mechanisms, furthermore, should not
compromise any existing functionality during extension. Software engineers
would benefit from a tool that provides a way to add extensions in a reliable
way. It is natural to expect programming languages to serve this role.
Extensible programming is one effort to address these issues.
In this thesis, we present type safe extensible programming using the MLPolyR
language. MLPolyR is an ML-like functional language whose type system provides
type-safe extensibility mechanisms at several levels. After presenting the
language, we will show how these extensibility mechanisms can be put to good
use in the context of product line engineering. Product line engineering is an
emerging software engineering paradigm that aims to manage variations, which
originate from successive changes in software.Comment: PhD Thesis submitted October, 200
Auto-Pipe and the X Language: A Toolset and Language for the Simulation, Analysis, and Synthesis of Heterogeneous Pipelined Architectures, Master\u27s Thesis, August 2006
Pipelining an algorithmis a popularmethod of increasing the performance of many computation-intensive applications. Often, one wants to form pipelines composed mostly of commonly used simple building blocks such as DSP components, simple math operations, encryption, or pattern matching stages. Additionally, one may desire to map these processing tasks to different computational resources based on their relative performance attributes (e.g., DSP operations on an FPGA). Auto-Pipe is composed of the X Language, a flexible interface language that aids the description of complex dataflow topologies (including pipelines); X-Com, a compiler for the X Language; X-Sim, a tool for modeling pipelined architectures based on measured, simulated, or derived task and communications behavior; X-Opt, a tool to optimize X applications under various metrics; and X-Dep, a tool for the automatic deployment of X-Com- or X-Sim-generated applications to real or simulated devices. This thesis presents an overview of the Auto-Pipe system, the design and use of the X Language, and an implementation of X-Com. Applications developed using the X Language are presented which demonstrate the effectiveness of describing algorithms using X, and the effectiveness of the Auto-Pipe development flow in analyzing and improving the performance of an application
- …