9,924 research outputs found

    Stream Fusion, to Completeness

    Full text link
    Stream processing is mainstream (again): Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell. Yet expressivity and performance are still lacking. For instance, the popular, well-optimized Java 8 streams do not support the zip operator and are still an order of magnitude slower than hand-written loops. We present the first approach that represents the full generality of stream processing and eliminates overheads, via the use of staging. It is based on an unusually rich semantic model of stream interaction. We support any combination of zipping, nesting (or flat-mapping), sub-ranging, filtering, mapping-of finite or infinite streams. Our model captures idiosyncrasies that a programmer uses in optimizing stream pipelines, such as rate differences and the choice of a "for" vs. "while" loops. Our approach delivers hand-written-like code, but automatically. It explicitly avoids the reliance on black-box optimizers and sufficiently-smart compilers, offering highest, guaranteed and portable performance. Our approach relies on high-level concepts that are then readily mapped into an implementation. Accordingly, we have two distinct implementations: an OCaml stream library, staged via MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we derive libraries richer and simultaneously many tens of times faster than past work. We greatly exceed in performance the standard stream libraries available in Java, Scala and OCaml, including the well-optimized Java 8 streams

    MOA: Massive Online Analysis, a framework for stream classification and clustering.

    Get PDF
    Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling up the implementation of state of the art algorithms to real world dataset sizes. It contains collection of offline and online for both classification and clustering as well as tools for evaluation. In particular, for classification it implements boosting, bagging, and Hoeffding Trees, all with and without Naive Bayes classifiers at the leaves. For clustering, it implements StreamKM++, CluStream, ClusTree, Den-Stream, D-Stream and CobWeb. Researchers benefit from MOA by getting insights into workings and problems of different approaches, practitioners can easily apply and compare several algorithms to real world data set and settings. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and is released under the GNU GPL license

    Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams

    Full text link
    Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages and platforms. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite, e.g., collections, and infinite data structures. However, using this API efficiently involves subtle considerations such as determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. Also, streams may not run all operations in parallel depending on particular collectors used in reductions. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions and transformations for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel, unorder or de-parallelize already parallel streams, and optimize streams involving complex reductions. The approach was implemented as a plug-in to the popular Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ∼642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential

    Matrix Code

    Full text link
    Matrix Code gives imperative programming a mathematical semantics and heuristic power comparable in quality to functional and logic programming. A program in Matrix Code is developed incrementally from a specification in pre/post-condition form. The computations of a code matrix are characterized by powers of the matrix when it is interpreted as a transformation in a space of vectors of logical conditions. Correctness of a code matrix is expressed in terms of a fixpoint of the transformation. The abstract machine for Matrix Code is the dual-state machine, which we present as a variant of the classical finite-state machine.Comment: 39 pages, 19 figures; extensions and minor correction

    Learning from medical data streams: an introduction

    Get PDF
    Clinical practice and research are facing a new challenge created by the rapid growth of health information science and technology, and the complexity and volume of biomedical data. Machine learning from medical data streams is a recent area of research that aims to provide better knowledge extraction and evidence-based clinical decision support in scenarios where data are produced as a continuous flow. This year's edition of AIME, the Conference on Artificial Intelligence in Medicine, enabled the sound discussion of this area of research, mainly by the inclusion of a dedicated workshop. This paper is an introduction to LEMEDS, the Learning from Medical Data Streams workshop, which highlights the contributed papers, the invited talk and expert panel discussion, as well as related papers accepted to the main conference

    Enhancing Regular Corecursion

    Get PDF
    Nowadays, data structures which are conceptually infinite, such as streams or infinite trees, are very common in computer science. When it comes to their manipulation, one major problem to face is how to finitely represent and deal with them without incurring in non-terminating behaviours. Regular corecursion is a solution relying on finite representation of regular data structures, and detection of cyclic calls. The topics in the thesis revolve around two enhancements of regular corecursion in different directions. In the first part, we present Corecursive Featherweight Java (coFJ), an object-oriented calculus which supports flexible regular corecursion, that is, allows the programmer to specify the behaviour when a cyclic call is found. In the second part, instead, we extend regular corecursion beyond regular terms, focusing on the significant case of stream definitions

    Foundational Extensible Corecursion

    Full text link
    This paper presents a formalized framework for defining corecursive functions safely in a total setting, based on corecursion up-to and relational parametricity. The end product is a general corecursor that allows corecursive (and even recursive) calls under well-behaved operations, including constructors. Corecursive functions that are well behaved can be registered as such, thereby increasing the corecursor's expressiveness. The metatheory is formalized in the Isabelle proof assistant and forms the core of a prototype tool. The corecursor is derived from first principles, without requiring new axioms or extensions of the logic
    corecore