Search CORE

7 research outputs found

Functional Collection Programming with Semi-Ring Dictionaries

Author: Huot Mathieu
Olteanu Dan
Shaikhha Amir
Smith Jaclyn
Publication venue
Publication date: 13/10/2021
Field of study

This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. Overall, we observe that SDQL achieves competitive performance relative to Typer and Tectorwise, which are state-of-the-art in-memory DB systems for (flat, not nested) relational data, and achieves an average 2x speedup over SciPy for LA workloads. For hybrid workloads involving LA processing, SDQL achieves up to one order of magnitude speedup over Trance, a state-of-the-art nested relational engine for nested biomedical data, and gives an average 40% speedup over LMFAO, a state-of-the-art in-DB machine learning engine for two (flat) relational real-world retail datasets

arXiv.org e-Print Archive

Edinburgh Research Explorer

A LANGUAGE-BASED APPROACH TO PROGRAMMING WITH SERIALIZED DATA

Author: Vollmer Michael
Publication venue: [Bloomington, Ind.] : Indiana University
Publication date: 01/02/2021
Field of study

Thesis (Ph.D.) - Indiana University, School of Informatics, Computing, and Engineering, 2021In a typical data-processing application, the representation of data in memory is distinct from its representation in a serialized form on disk. The former has pointers and an arbitrary, sparse layout, facilitating easier manipulation by a program, while the latter is packed contiguously, facilitating easier I/O. I propose a programming language, LoCal, that unifies the in-memory and on-disk representations of data. LoCal extends prior work on region calculi into a location calculus, employing a type system that tracks the byte-addressed layout of all heap values. I present the formal semantics of LoCal and prove type safety, and show how to infer LoCal programs from unannotated source terms. Then, I demonstrate how to efficiently implement LoCal in a practical compiler that produces code competitive with hand-written C

IUScholarWorks (University of Indiana)

Kent Academic Repository

Compilation and Code Optimization for Data Analytics

Author: Abdullah Abdul Rahim
Kasim Rizanaliah
Mohamad Basir Muhammad Sufyan Safwan
Ramli Mohd Zulkifli
Selamat Nur Asmiza
Publication venue: Lausanne, EPFL
Publication date: 01/03/2016
Field of study

The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance. The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation. As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive. The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code

Infoscience - École polytechnique fédérale de Lausanne

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Compilation and Code Optimization for Data Analytics

Author: Shaikhha Amir
Publication venue: Lausanne, EPFL
Publication date: 29/08/2018
Field of study

Infoscience - École polytechnique fédérale de Lausanne