2,824 research outputs found
Implementing Joins using Extensible Pattern Matching
Join patterns are an attractive declarative way to synchronize both threads and asynchronous distributed computations. We explore joins in the context of extensible pattern matching that recently appeared in languages such as F# and Scala. Our implementation supports join patterns with multiple synchronous events, and guards. Furthermore, we integrated joins into an existing actor-based concurrency framework. It enables join patterns to be used in the context of more advanced synchronization modes, such as future-type message sending and token-passing continuations
CPL: A Core Language for Cloud Computing -- Technical Report
Running distributed applications in the cloud involves deployment. That is,
distribution and configuration of application services and middleware
infrastructure. The considerable complexity of these tasks resulted in the
emergence of declarative JSON-based domain-specific deployment languages to
develop deployment programs. However, existing deployment programs unsafely
compose artifacts written in different languages, leading to bugs that are hard
to detect before run time. Furthermore, deployment languages do not provide
extension points for custom implementations of existing cloud services such as
application-specific load balancing policies.
To address these shortcomings, we propose CPL (Cloud Platform Language), a
statically-typed core language for programming both distributed applications as
well as their deployment on a cloud platform. In CPL, application services and
deployment programs interact through statically typed, extensible interfaces,
and an application can trigger further deployment at run time. We provide a
formal semantics of CPL and demonstrate that it enables type-safe, composable
and extensible libraries of service combinators, such as load balancing and
fault tolerance.Comment: Technical report accompanying the MODULARITY '16 submissio
A Data Transformation System for Biological Data Sources
Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data
R2O, an extensible and semantically based database-to-ontology mapping language
We present R2O, an extensible and declarative language to describe mappings between relational DB schemas and ontologies implemented in RDF(S) or OWL. R2O provides an extensible set of primitives with welldefined semantics. This language has been conceived expressive enough to cope with complex mapping cases arisen from situations of low similarity between the ontology and the DB models
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Apache Calcite is a foundational software framework that provides query
processing, optimization, and query language support to many popular
open-source data processing systems such as Apache Hive, Apache Storm, Apache
Flink, Druid, and MapD. Calcite's architecture consists of a modular and
extensible query optimizer with hundreds of built-in optimization rules, a
query processor capable of processing a variety of query languages, an adapter
architecture designed for extensibility, and support for heterogeneous data
models and stores (relational, semi-structured, streaming, and geospatial).
This flexible, embeddable, and extensible architecture is what makes Calcite an
attractive choice for adoption in big-data frameworks. It is an active project
that continues to introduce support for the new types of data sources, query
languages, and approaches to query processing and optimization.Comment: SIGMOD'1
Non-linear Pattern Matching with Backtracking for Non-free Data Types
Non-free data types are data types whose data have no canonical forms. For
example, multisets are non-free data types because the multiset has
two other equivalent but literally different forms and .
Pattern matching is known to provide a handy tool set to treat such data types.
Although many studies on pattern matching and implementations for practical
programming languages have been proposed so far, we observe that none of these
studies satisfy all the criteria of practical pattern matching, which are as
follows: i) efficiency of the backtracking algorithm for non-linear patterns,
ii) extensibility of matching process, and iii) polymorphism in patterns.
This paper aims to design a new pattern-matching-oriented programming
language that satisfies all the above three criteria. The proposed language
features clean Scheme-like syntax and efficient and extensible pattern matching
semantics. This programming language is especially useful for the processing of
complex non-free data types that not only include multisets and sets but also
graphs and symbolic mathematical expressions. We discuss the importance of our
criteria of practical pattern matching and how our language design naturally
arises from the criteria. The proposed language has been already implemented
and open-sourced as the Egison programming language
Versatile event correlation with algebraic effects
We present the first language design to uniformly express variants of
n
-way joins over asynchronous event streams from different domains, e.g., stream-relational algebra, event processing, reactive and concurrent programming. We model asynchronous reactive programs and joins in direct style, on top of algebraic effects and handlers. Effect handlers act as modular interpreters of event notifications, enabling fine-grained control abstractions and customizable event matching. Join variants can be considered as cartesian product computations with ”degenerate” control flow, such that unnecessary tuples are not materialized a priori. Based on this computational interpretation, we decompose joins into a generic, naive enumeration procedure of the cartesian product, plus variant-specific extensions, represented in terms of user-supplied effect handlers. Our microbenchmarks validate that this extensible design avoids needless materialization. Alongside a formal semantics for joining and prototypes in Koka and multicore OCaml, we contribute a systematic comparison of the covered domains and features.
ERC, Advanced Grant No. 321217
ERC, Consolidator Grant No. 617805
DFG, SFB 1053
DFG, SA 2918/2-
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
- …