9,924 research outputs found
Stream Fusion, to Completeness
Stream processing is mainstream (again): Widely-used stream libraries are now
available for virtually all modern OO and functional languages, from Java to C#
to Scala to OCaml to Haskell. Yet expressivity and performance are still
lacking. For instance, the popular, well-optimized Java 8 streams do not
support the zip operator and are still an order of magnitude slower than
hand-written loops. We present the first approach that represents the full
generality of stream processing and eliminates overheads, via the use of
staging. It is based on an unusually rich semantic model of stream interaction.
We support any combination of zipping, nesting (or flat-mapping), sub-ranging,
filtering, mapping-of finite or infinite streams. Our model captures
idiosyncrasies that a programmer uses in optimizing stream pipelines, such as
rate differences and the choice of a "for" vs. "while" loops. Our approach
delivers hand-written-like code, but automatically. It explicitly avoids the
reliance on black-box optimizers and sufficiently-smart compilers, offering
highest, guaranteed and portable performance. Our approach relies on high-level
concepts that are then readily mapped into an implementation. Accordingly, we
have two distinct implementations: an OCaml stream library, staged via
MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we
derive libraries richer and simultaneously many tens of times faster than past
work. We greatly exceed in performance the standard stream libraries available
in Java, Scala and OCaml, including the well-optimized Java 8 streams
MOA: Massive Online Analysis, a framework for stream classification and clustering.
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling up the implementation of state of the art algorithms to real world dataset sizes. It contains collection of offline and online for both classification and clustering as well as tools for evaluation. In particular, for classification it implements boosting, bagging, and Hoeffding Trees, all with and without Naive Bayes classifiers at the leaves. For clustering, it implements StreamKM++, CluStream, ClusTree, Den-Stream, D-Stream and CobWeb. Researchers benefit from MOA by getting insights into workings and problems of different approaches, practitioners can easily apply and compare several algorithms to real world data set and settings. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and is released under the GNU GPL license
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages and platforms. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite, e.g., collections, and infinite data structures. However, using this API efficiently involves subtle considerations such as determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. Also, streams may not run all operations in parallel depending on particular collectors used in reductions. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions and transformations for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel, unorder or de-parallelize already parallel streams, and optimize streams involving complex reductions. The approach was implemented as a plug-in to the popular Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ∼642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential
Matrix Code
Matrix Code gives imperative programming a mathematical semantics and
heuristic power comparable in quality to functional and logic programming. A
program in Matrix Code is developed incrementally from a specification in
pre/post-condition form. The computations of a code matrix are characterized by
powers of the matrix when it is interpreted as a transformation in a space of
vectors of logical conditions. Correctness of a code matrix is expressed in
terms of a fixpoint of the transformation. The abstract machine for Matrix Code
is the dual-state machine, which we present as a variant of the classical
finite-state machine.Comment: 39 pages, 19 figures; extensions and minor correction
Learning from medical data streams: an introduction
Clinical practice and research are facing a new challenge created by the rapid growth of health information science and technology, and the complexity and volume of biomedical data. Machine learning from medical data streams is a recent area of research that aims to provide better knowledge extraction and evidence-based clinical decision support in scenarios where data are produced as a continuous flow. This year's edition of AIME, the Conference on Artificial Intelligence in Medicine, enabled the sound discussion of this area of research, mainly by the inclusion of a dedicated workshop. This paper is an introduction to LEMEDS, the Learning from Medical Data Streams workshop, which highlights the contributed papers, the invited talk and expert panel discussion, as well as related papers accepted to the main conference
Enhancing Regular Corecursion
Nowadays, data structures which are conceptually infinite, such as streams or infinite trees, are very common in computer science. When it comes to their manipulation, one major problem to face is how to finitely represent and deal with them without incurring in non-terminating behaviours. Regular corecursion is a solution relying on finite representation of regular data structures, and detection of cyclic calls. The topics in the thesis revolve around two enhancements of regular corecursion in different directions. In the first part, we present Corecursive Featherweight Java (coFJ), an object-oriented calculus which supports flexible regular corecursion, that is, allows the programmer to specify the behaviour when a cyclic call is found. In the second part, instead, we extend regular corecursion beyond regular terms, focusing on the significant case of stream definitions
Foundational Extensible Corecursion
This paper presents a formalized framework for defining corecursive functions
safely in a total setting, based on corecursion up-to and relational
parametricity. The end product is a general corecursor that allows corecursive
(and even recursive) calls under well-behaved operations, including
constructors. Corecursive functions that are well behaved can be registered as
such, thereby increasing the corecursor's expressiveness. The metatheory is
formalized in the Isabelle proof assistant and forms the core of a prototype
tool. The corecursor is derived from first principles, without requiring new
axioms or extensions of the logic
- …