4,505 research outputs found
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific analyses commonly compose multiple single-process programs into a
dataflow. An end-to-end dataflow of single-process programs is known as a
many-task application. Typically, tools from the HPC software stack are used to
parallelize these analyses. In this work, we investigate an alternate approach
that uses Apache Spark -- a modern big data platform -- to parallelize
many-task applications. We present Kira, a flexible and distributed astronomy
image processing toolkit using Apache Spark. We then use the Kira toolkit to
implement a Source Extractor application for astronomy images, called Kira SE.
With Kira SE as the use case, we study the programming flexibility, dataflow
richness, scheduling capacity and performance of Apache Spark running on the
EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an
equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon
EC2 cloud. Furthermore, we show that by leveraging software originally designed
for big data infrastructure, Kira SE achieves competitive performance to the C
implementation running on the NERSC Edison supercomputer. Our experience with
Kira indicates that emerging Big Data platforms such as Apache Spark are a
performant alternative for many-task scientific applications
Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database
The Apache Accumulo database excels at distributed storage and indexing and
is ideally suited for storing graph data. Many big data analytics compute on
graph data and persist their results back to the database. These graph
calculations are often best performed inside the database server. The GraphBLAS
standard provides a compact and efficient basis for a wide range of graph
applications through a small number of sparse matrix operations. In this
article, we implement GraphBLAS sparse matrix multiplication server-side by
leveraging Accumulo's native, high-performance iterators. We compare the
mathematics and performance of inner and outer product implementations, and
show how an outer product implementation achieves optimal performance near
Accumulo's peak write rate. We offer our work as a core component to the
Graphulo library that will deliver matrix math primitives for graph analytics
within Accumulo.Comment: To be presented at IEEE HPEC 2015: http://www.ieee-hpec.org
Human-Centric Program Synthesis
Program synthesis techniques offer significant new capabilities in searching for programs that satisfy high-level specifications. While synthesis has been thoroughly explored for input/output pair specifications (programming-by-example), this paper asks: what does program synthesis look like beyond examples? What actual issues in day-to-day development would stand to benefit the most from synthesis? How can a human-centric perspective inform the exploration of alternative specification languages for synthesis? I sketch a human-centric vision for program synthesis where programmers explore and learn languages and APIs aided by a synthesis tool
The role of concurrency in an evolutionary view of programming abstractions
In this paper we examine how concurrency has been embodied in mainstream
programming languages. In particular, we rely on the evolutionary talking
borrowed from biology to discuss major historical landmarks and crucial
concepts that shaped the development of programming languages. We examine the
general development process, occasionally deepening into some language, trying
to uncover evolutionary lineages related to specific programming traits. We
mainly focus on concurrency, discussing the different abstraction levels
involved in present-day concurrent programming and emphasizing the fact that
they correspond to different levels of explanation. We then comment on the role
of theoretical research on the quest for suitable programming abstractions,
recalling the importance of changing the working framework and the way of
looking every so often. This paper is not meant to be a survey of modern
mainstream programming languages: it would be very incomplete in that sense. It
aims instead at pointing out a number of remarks and connect them under an
evolutionary perspective, in order to grasp a unifying, but not simplistic,
view of the programming languages development process
Topicality and Social Impact: Diverse Messages but Focused Messengers
Are users who comment on a variety of matters more likely to achieve high
influence than those who delve into one focused field? Do general Twitter
hashtags, such as #lol, tend to be more popular than novel ones, such as
#instantlyinlove? Questions like these demand a way to detect topics hidden
behind messages associated with an individual or a hashtag, and a gauge of
similarity among these topics. Here we develop such an approach to identify
clusters of similar hashtags by detecting communities in the hashtag
co-occurrence network. Then the topical diversity of a user's interests is
quantified by the entropy of her hashtags across different topic clusters. A
similar measure is applied to hashtags, based on co-occurring tags. We find
that high topical diversity of early adopters or co-occurring tags implies high
future popularity of hashtags. In contrast, low diversity helps an individual
accumulate social influence. In short, diverse messages and focused messengers
are more likely to gain impact.Comment: 9 pages, 7 figures, 6 table
Recommended from our members
Accurate modeling of core and memory locality for proxy generation targeting emerging applications and architectures
Designing optimal computer systems for improved performance and energy efficiency requires architects and designers to have a deep understanding of the end-user workloads. However, many end-users (e.g., large corporations, banks, defense organizations, etc.) are apprehensive to share their applications with designers due to the confidential nature of software code and data. In addition, emerging applications pose significant challenges to early design space exploration due to their long-running nature and the highly complex nature of their software stack that cannot be supported on many early performance models.
The above challenges can be overcome by using a proxy benchmark. A miniaturized proxy benchmark can be used as a substitute of the original workload to perform early computer performance evaluation. The process of generating a proxy benchmark consists of extracting a set of key statistics to summarize the behavior of end-user applications through profiling and using the collected statistics to synthesize a representative proxy benchmark. Using such proxy benchmarks can help designers to understand the behavior of end-user’s workloads in a reasonable time without the users having to disclose sensitive information about their workloads.
Prior proxy benchmarking schemes leverage micro-architecture independent metrics, derived from detailed simulation tools, to generate proxy benchmarks. However, many emerging workloads do not work reliably with many profiling or simulation tools, in which case it becomes impossible to apply prior proxy generation techniques to generate proxy benchmarks for such complex applications. Furthermore, these techniques model instruction pipeline-level locality in great detail, but abstract out memory locality modeling using simple stride-based models. This results in poor cloning accuracy especially for emerging applications, which have larger memory footprints and complex access patterns. A few detailed cache and memory locality modeling techniques have also been proposed in literature. However, these techniques either model limited locality metrics and suffer from poor cloning accuracy or are fairly accurate, but at the expense of significant metadata overhead. Finally, none of the prior proxy benchmarking techniques model both core and memory locality with high accuracy. As a result, they are not useful for studying system-level performance behavior. Keeping the above key limitations and shortcomings of prior work in mind, this dissertation presents several techniques that expand the frontiers of workload proxy benchmarking, thereby enabling computer designers to gain a better and faster understanding of end-user application behavior without compromising the privileged nature of software or data.
This dissertation first presents a core-level proxy benchmark generation methodology that leverages performance metrics derived from hardware performance counter measurements to create miniature proxy benchmarks targeting emerging big-data applications. The presented performance counter based characterization and associated extrapolation into generic parameters for proxy generation enables faster analysis (runs almost at native hardware speeds, unlike prior workload cloning proposals) and proxy generation for emerging applications that do not work with simulators or profiling tools. The generated proxy benchmarks are representative of the performance of the real-world big-data applications, including operating system and run-time effects, and yet converge to results quickly without needing any complex software stack support.
Next, to improve upon the accuracy and efficiency of prior memory proxy benchmarking techniques, this dissertation presents a novel memory locality modeling technique that leverages localized pattern detection to create miniature memory proxy benchmarks. The presented technique models memory reference locality by decomposing an application’s memory accesses into a set of independent streams (localized by using address region based localization property), tracking fine-grained patterns within the localized streams and, finally, chaining or interleaving accesses from different localized memory streams to create an ordered proxy memory access sequence. This dissertation further extends the workload cloning approach to Graphics Processing Units (GPUs) and presents a novel proxy generation methodology to model the inherent memory access locality of GPU applications, while also accounting for the GPU’s parallel execution model. The generated memory proxy benchmarks help to enable fast and efficient design space exploration of futuristic memory hierarchies.
Finally, this dissertation presents a novel technique to integrate accurate core and memory locality models to create system-level proxy benchmarks targeting emerging applications. This is a new capability that can facilitate efficient overall system (core, cache and memory subsystem) design-space exploration. This dissertation further presents a novel methodology that exploits the synthetic benchmark generation framework to create hypothetical workloads with performance behavior that does not currently exist. Such proxies can be generated to cover anticipated code trends and can represent futuristic workloads before the workloads even exist.Electrical and Computer Engineerin
- …