Search CORE

481 research outputs found

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

Author: Cheung Alvin
Kemper Alfons
Palkar Shoumik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/06/2018
Field of study

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi

arXiv.org e-Print Archive

Crossref

Ultrasound Nerve Segmentation Using Deep Probabilistic Programming

Author: Meedeniya Dulani
Rubasinghe Iresha
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 01/12/2019
Field of study

Deep probabilistic programming concatenates the strengths of deep learning to the context of probabilistic modeling for efficient and flexible computation in practice. Being an evolving field, there exist only a few expressive programming languages for uncertainty management. This paper discusses an application for analysis of ultrasound nerve segmentation-based biomedical images. Our method uses the probabilistic programming language Edward with the U-Net model and generative adversarial networks under different optimizers. The segmentation process showed the least Dice loss ("‘0.54) and the highest accuracy (0.99) with the Adam optimizer in the U-Net model with the least time consumption compared to other optimizers. The smallest amount of generative network loss in the generative adversarial network model gained was 0.69 for the Adam optimizer. The Dice loss, accuracy, time consumption and output image quality in the results show the applicability of deep probabilistic programming in the long run. Thus, we further propose a neuroscience decision support system based on the proposed approach

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal

Enabling Operator Reordering in Data Flow Programs Through Static Code Analysis

Author: Hueske Fabian
Krettek Aljoscha
Tzoumas Kostas
Publication venue
Publication date: 17/01/2013
Field of study

In many massively parallel data management platforms, programs are represented as small imperative pieces of code connected in a data flow. This popular abstraction makes it hard to apply algebraic reordering techniques employed by relational DBMSs and other systems that use an algebraic programming abstraction. We present a code analysis technique based on reverse data and control flow analysis that discovers a set of properties from user code, which can be used to emulate algebraic optimizations in this setting.Comment: 4 pages, accepted and presented at the First International Workshop on Cross-model Language Design and Implementation (XLDI), affiliated with ICFP 2012, Copenhage

arXiv.org e-Print Archive

CiteSeerX

AMaχoS—Abstract Machine for Xcerpt

Author: Bry François
Furche Tim
Linse Benedikt
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Web query languages promise convenient and efficient access to Web data such as XML, RDF, or Topic Maps. Xcerpt is one such Web query language with strong emphasis on novel high-level constructs for effective and convenient query authoring, particularly tailored to versatile access to data in different Web formats such as XML or RDF. However, so far it lacks an efficient implementation to supplement the convenient language features. AMaχoS is an abstract machine implementation for Xcerpt that aims at efficiency and ease of deployment. It strictly separates compilation and execution of queries: Queries are compiled once to abstract machine code that consists in (1) a code segment with instructions for evaluating each rule and (2) a hint segment that provides the abstract machine with optimization hints derived by the query compilation. This article summarizes the motivation and principles behind AMaχoS and discusses how its current architecture realizes these principles

Open Access LMU

Declarative Data Analytics: a Survey

Author: Makrynioti Nantia
Vassalos Vasilis
Publication venue
Publication date: 01/01/2004
Field of study

The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in those languages. The execution engine can be either centralized or distributed, as the declarative paradigm advocates independence from particular physical implementations. The survey explores a wide range of declarative data analysis frameworks by examining both the programming model and the optimization techniques used, in order to provide conclusions on the current state of the art in the area and identify open challenges.Comment: 36 pages, 2 figure

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Database-Inspired Optimizations for Statistical Analysis

Author: Alexander Bertram
Hannes Mühleisen
Maarten-Jan Kallen
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/11/2018
Field of study

Computing complex statistics on large amounts of data is no longer a corner case, but a daily challenge. However, current tools such as GNU R were not built to efficiently handle large data sets. We propose to vastly improve the execution of R scripts by interpreting them as a declaration of intent rather than an imperative order set in stone. This allows us to apply optimization techniques from the columnar data management research field. We have implemented several of these optimizers in Renjin, an open-source execution environment for R scripts targeted at the Java virtual machine. The demonstration of our approach using a series of micro-benchmarks and experiments on complex survey analysis show orders-of-magnitude improvements in analysis cost

CWI's Institutional Repository

Directory of Open Access Journals

Journal of Statistical Software