11,954 research outputs found
Deconstructing yield operator to enhance streams processing
Este trabalho foi financiado pelo Concurso Anual para Projetos de Investigação, Desenvolvimento, Inovação e Criação ArtÃstica (IDI&CA) 2020 do Instituto Politécnico de Lisboa. Código de referência IPL/2020/WebFluid/ISELCustomizing streams pipelines with new user-defined operations is a well-known pattern regarding streams processing. However, programming languages face two challenges when considering streams extensibility: 1) provide a compact and readable way to express new operations, and 2) keep streams’ laziness behavior. From here, we may find a consensus around the adoption of the generator operator, i.e. yield, as a means to fulfil both requirements, since most state-of-the-art programming languages provide this feature. Yet, what is the performance overhead of interleaving a yield-based operation in streams processing? In this work we present a benchmark based on realistic use cases of two different web APIs, namely: Last.fm and world weather on line, where custom yield-based operations may degrade the streams performance in twofold. We also propose a purely functional and minimalistic design, named tinyield, that can be easily adopted in any programming language and provides a concise way of chaining extension operations fluently, with low overhead in the eval uated benchmarks. The tinyield proposal was deployed in three different libraries, namely for Java (jayield), JavaScript (tinyield4ts) and .Net (tinyield4net).info:eu-repo/semantics/publishedVersio
Optimizing Abstract Abstract Machines
The technique of abstracting abstract machines (AAM) provides a systematic
approach for deriving computable approximations of evaluators that are easily
proved sound. This article contributes a complementary step-by-step process for
subsequently going from a naive analyzer derived under the AAM approach, to an
efficient and correct implementation. The end result of the process is a two to
three order-of-magnitude improvement over the systematically derived analyzer,
making it competitive with hand-optimized implementations that compute
fundamentally less precise results.Comment: Proceedings of the International Conference on Functional Programming
2013 (ICFP 2013). Boston, Massachusetts. September, 201
Stream Fusion, to Completeness
Stream processing is mainstream (again): Widely-used stream libraries are now
available for virtually all modern OO and functional languages, from Java to C#
to Scala to OCaml to Haskell. Yet expressivity and performance are still
lacking. For instance, the popular, well-optimized Java 8 streams do not
support the zip operator and are still an order of magnitude slower than
hand-written loops. We present the first approach that represents the full
generality of stream processing and eliminates overheads, via the use of
staging. It is based on an unusually rich semantic model of stream interaction.
We support any combination of zipping, nesting (or flat-mapping), sub-ranging,
filtering, mapping-of finite or infinite streams. Our model captures
idiosyncrasies that a programmer uses in optimizing stream pipelines, such as
rate differences and the choice of a "for" vs. "while" loops. Our approach
delivers hand-written-like code, but automatically. It explicitly avoids the
reliance on black-box optimizers and sufficiently-smart compilers, offering
highest, guaranteed and portable performance. Our approach relies on high-level
concepts that are then readily mapped into an implementation. Accordingly, we
have two distinct implementations: an OCaml stream library, staged via
MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we
derive libraries richer and simultaneously many tens of times faster than past
work. We greatly exceed in performance the standard stream libraries available
in Java, Scala and OCaml, including the well-optimized Java 8 streams
Measuring NUMA effects with the STREAM benchmark
Modern high-end machines feature multiple processor packages, each of which
contains multiple independent cores and integrated memory controllers connected
directly to dedicated physical RAM. These packages are connected via a shared
bus, creating a system with a heterogeneous memory hierarchy. Since this shared
bus has less bandwidth than the sum of the links to memory, aggregate memory
bandwidth is higher when parallel threads all access memory local to their
processor package than when they access memory attached to a remote package.
But, the impact of this heterogeneous memory architecture is not easily
understood from vendor benchmarks. Even where these measurements are available,
they provide only best-case memory throughput. This work presents a series of
modifications to the well-known STREAM benchmark to measure the effects of NUMA
on both a 48-core AMD Opteron machine and a 32-core Intel Xeon machine
06472 Abstracts Collection - XQuery Implementation Paradigms
From 19.11.2006 to 22.11.2006, the Dagstuhl Seminar 06472 ``XQuery Implementation Paradigms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available
- …