481 research outputs found
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
MapReduce is a popular programming paradigm for developing large-scale,
data-intensive computation. Many frameworks that implement this paradigm have
recently been developed. To leverage these frameworks, however, developers must
become familiar with their APIs and rewrite existing code. Casper is a new tool
that automatically translates sequential Java programs into the MapReduce
paradigm. Casper identifies potential code fragments to rewrite and translates
them in two steps: (1) Casper uses program synthesis to search for a program
summary (i.e., a functional specification) of each code fragment. The summary
is expressed using a high-level intermediate language resembling the MapReduce
paradigm and verified to be semantically equivalent to the original using a
theorem prover. (2) Casper generates executable code from the summary, using
either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically
converting real-world, sequential Java benchmarks to MapReduce. The resulting
benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi
Ultrasound Nerve Segmentation Using Deep Probabilistic Programming
Deep probabilistic programming concatenates the strengths of deep learning to the context of probabilistic modeling for efficient and flexible computation in practice. Being an evolving field, there exist only a few expressive programming languages for uncertainty management. This paper discusses an application for analysis of ultrasound nerve segmentation-based biomedical images. Our method uses the probabilistic programming language Edward with the U-Net model and generative adversarial networks under different optimizers. The segmentation process showed the least Dice loss ("‘0.54) and the highest accuracy (0.99) with the Adam optimizer in the U-Net model with the least time consumption compared to other optimizers. The smallest amount of generative network loss in the generative adversarial network model gained was 0.69 for the Adam optimizer. The Dice loss, accuracy, time consumption and output image quality in the results show the applicability of deep probabilistic programming in the long run. Thus, we further propose a neuroscience decision support system based on the proposed approach
Enabling Operator Reordering in Data Flow Programs Through Static Code Analysis
In many massively parallel data management platforms, programs are
represented as small imperative pieces of code connected in a data flow. This
popular abstraction makes it hard to apply algebraic reordering techniques
employed by relational DBMSs and other systems that use an algebraic
programming abstraction. We present a code analysis technique based on reverse
data and control flow analysis that discovers a set of properties from user
code, which can be used to emulate algebraic optimizations in this setting.Comment: 4 pages, accepted and presented at the First International Workshop
on Cross-model Language Design and Implementation (XLDI), affiliated with
ICFP 2012, Copenhage
AMaχoS—Abstract Machine for Xcerpt
Web query languages promise convenient and efficient access
to Web data such as XML, RDF, or Topic Maps. Xcerpt is one such Web
query language with strong emphasis on novel high-level constructs for
effective and convenient query authoring, particularly tailored to versatile
access to data in different Web formats such as XML or RDF.
However, so far it lacks an efficient implementation to supplement the
convenient language features. AMaχoS is an abstract machine implementation
for Xcerpt that aims at efficiency and ease of deployment. It
strictly separates compilation and execution of queries: Queries are compiled
once to abstract machine code that consists in (1) a code segment
with instructions for evaluating each rule and (2) a hint segment that
provides the abstract machine with optimization hints derived by the
query compilation. This article summarizes the motivation and principles
behind AMaχoS and discusses how its current architecture realizes
these principles
Declarative Data Analytics: a Survey
The area of declarative data analytics explores the application of the
declarative paradigm on data science and machine learning. It proposes
declarative languages for expressing data analysis tasks and develops systems
which optimize programs written in those languages. The execution engine can be
either centralized or distributed, as the declarative paradigm advocates
independence from particular physical implementations. The survey explores a
wide range of declarative data analysis frameworks by examining both the
programming model and the optimization techniques used, in order to provide
conclusions on the current state of the art in the area and identify open
challenges.Comment: 36 pages, 2 figure
Database-Inspired Optimizations for Statistical Analysis
Computing complex statistics on large amounts of data is no longer a corner case, but a daily challenge. However, current tools such as GNU R were not built to efficiently handle large data sets. We propose to vastly improve the execution of R scripts by interpreting them as a declaration of intent rather than an imperative order set in stone. This allows us to apply optimization techniques from the columnar data management research field. We have implemented several of these optimizers in Renjin, an open-source execution environment for R scripts targeted at the Java virtual machine. The demonstration of our approach using a series of micro-benchmarks and experiments on complex survey analysis show orders-of-magnitude improvements in analysis cost
- …