1,610 research outputs found
Infrastructure for Usable Machine Learning: The Stanford DAWN Project
Despite incredible recent advances in machine learning, building machine
learning applications remains prohibitively time-consuming and expensive for
all but the best-trained, best-funded engineering organizations. This expense
comes not from a need for new and improved statistical models but instead from
a lack of systems and tools for supporting end-to-end machine learning
application development, from data preparation and labeling to
productionization and monitoring. In this document, we outline opportunities
for infrastructure supporting usable, end-to-end machine learning applications
in the context of the nascent DAWN (Data Analytics for What's Next) project at
Stanford
Study on Resource Efficiency of Distributed Graph Processing
Graphs may be used to represent many different problem domains -- a concrete
example is that of detecting communities in social networks, which are
represented as graphs. With big data and more sophisticated applications
becoming widespread in recent years, graph processing has seen an emergence of
requirements pertaining data volume and volatility. This multidisciplinary
study presents a review of relevant distributed graph processing systems.
Herein they are presented in groups defined by common traits (distributed
processing paradigm, type of graph operations, among others), with an overview
of each system's strengths and weaknesses. The set of systems is then narrowed
down to a set of two, upon which quantitative analysis was performed. For this
quantitative comparison of systems, focus was cast on evaluating the
performance of algorithms for the problem of detecting communities. To help
further understand the evaluations performed, a background is provided on graph
clustering
Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management
High-performance computing platforms such as supercomputers have
traditionally been designed to meet the compute demands of scientific
applications. Consequently, they have been architected as producers and not
consumers of data. The Apache Hadoop ecosystem has evolved to meet the
requirements of data processing applications and has addressed many of the
limitations of HPC platforms. There exist a class of scientific applications
however, that need the collective capabilities of traditional high-performance
computing environments and the Apache Hadoop ecosystem. For example, the
scientific domains of bio-molecular dynamics, genomics and network science need
to couple traditional computing with Hadoop/Spark based analysis. We
investigate the critical question of how to present the capabilities of both
computing environments to such scientific applications. Whereas this questions
needs answers at multiple levels, we focus on the design of resource management
middleware that might support the needs of both. We propose extensions to the
Pilot-Abstraction to provide a unifying resource management layer. This is an
important step that allows applications to integrate HPC stages (e.g.
simulations) to data analytics. Many supercomputing centers have started to
officially support Hadoop environments, either in a dedicated environment or in
hybrid deployments using tools such as myHadoop. This typically involves many
intrinsic, environment-specific details that need to be mastered, and often
swamp conceptual issues like: How best to couple HPC and Hadoop application
stages? How to explore runtime trade-offs (data localities vs. data movement)?
This paper provides both conceptual understanding and practical solutions to
the integrated use of HPC and Hadoop environments
Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study
While cluster computing frameworks are continuously evolving to provide
real-time data analysis capabilities, Apache Spark has managed to be at the
forefront of big data analytics for being a unified framework for both, batch
and stream data processing. However, recent studies on micro-architectural
characterization of in-memory data analytics are limited to only batch
processing workloads. We compare micro-architectural performance of batch
processing and stream processing workloads in Apache Spark using hardware
performance counters on a dual socket server. In our evaluation experiments, we
have found that batch processing are stream processing workloads have similar
micro-architectural characteristics and are bounded by the latency of frequent
data access to DRAM. For data accesses we have found that simultaneous
multi-threading is effective in hiding the data latencies. We have also
observed that (i) data locality on NUMA nodes can improve the performance by
10% on average and(ii) disabling next-line L1-D prefetchers can reduce the
execution time by up-to 14\% and (iii) multiple small executors can provide
up-to 36\% speedup over single large executor
Pilot-Abstraction: A Valid Abstraction for Data-Intensive Applications on HPC, Hadoop and Cloud Infrastructures?
HPC environments have traditionally been designed to meet the compute demand
of scientific applications and data has only been a second order concern. With
science moving toward data-driven discoveries relying more on correlations in
data to form scientific hypotheses, the limitations of HPC approaches become
apparent: Architectural paradigms such as the separation of storage and compute
are not optimal for I/O intensive workloads (e.g. for data preparation,
transformation and SQL). While there are many powerful computational and
analytical libraries available on HPC (e.g. for scalable linear algebra), they
generally lack the usability and variety of analytical libraries found in other
environments (e.g. the Apache Hadoop ecosystem). Further, there is a lack of
abstractions that unify access to increasingly heterogeneous infrastructure
(HPC, Hadoop, clouds) and allow reasoning about performance trade-offs in this
complex environment. At the same time, the Hadoop ecosystem is evolving rapidly
and has established itself as de-facto standard for data-intensive workloads in
industry and is increasingly used to tackle scientific problems. In this paper,
we explore paths to interoperability between Hadoop and HPC, examine the
differences and challenges, such as the different architectural paradigms and
abstractions, and investigate ways to address them. We propose the extension of
the Pilot-Abstraction to Hadoop to serve as interoperability layer for
allocating and managing resources across different infrastructures. Further,
in-memory capabilities have been deployed to enhance the performance of
large-scale data analytics (e.g. iterative algorithms) for which the ability to
re-use data across iterations is critical. As memory naturally fits in with the
Pilot concept of retaining resources for a set of tasks, we propose the
extension of the Pilot-Abstraction to in-memory resources.Comment: Submitted to HPDC 2015, 12 pages, 9 figure
A Survey on Geographically Distributed Big-Data Processing using MapReduce
Hadoop and Spark are widely used distributed processing frameworks for
large-scale data processing in an efficient and fault-tolerant manner on
private or public clouds. These big-data processing systems are extensively
used by many industries, e.g., Google, Facebook, and Amazon, for solving a
large class of problems, e.g., search, clustering, log analysis, different
types of join operations, matrix multiplication, pattern matching, and social
network analysis. However, all these popular systems have a major drawback in
terms of locally distributed computations, which prevent them in implementing
geographically distributed data processing. The increasing amount of
geographically distributed massive data is pushing industries and academia to
rethink the current big-data processing systems. The novel frameworks, which
will be beyond state-of-the-art architectures and technologies involved in the
current system, are expected to process geographically distributed data at
their locations without moving entire raw datasets to a single location. In
this paper, we investigate and discuss challenges and requirements in designing
geographically distributed data processing frameworks and protocols. We
classify and study batch processing (MapReduce-based systems), stream
processing (Spark-based systems), and SQL-style processing geo-distributed
frameworks, models, and algorithms with their overhead issues.Comment: IEEE Transactions on Big Data; Accepted June 2017. 20 page
Development details and computational benchmarking of DEPAM
In the big data era of observational oceanography, passive acoustics datasets
are becoming too high volume to be processed on local computers due to their
processor and memory limitations. As a result there is a current need for our
community to turn to cloud-based distributed computing. We present a scalable
computing system for FFT (Fast Fourier Transform)-based features (e.g., Power
Spectral Density) based on the Apache distributed frameworks Hadoop and Spark.
These features are at the core of many different types of acoustic analysis
where the need of processing data at scale with speed is evident, e.g. serving
as long-term averaged learning representations of soundscapes to identify
periods of acoustic interest. In addition to provide a complete description of
our system implementation, we also performed a computational benchmark
comparing our system to three other Scala-only, Matlab and Python based systems
in standalone executions, and evaluated its scalability using the speed up
metric. Our current results are very promising in terms of computational
performance, as we show that our proposed Hadoop/Spark system performs
reasonably well on a single node setup comparatively to state-of-the-art
processing tools used by the PAM community, and that it could also fully
leverage more intensive cluster resources with a almost-linear scalability
behaviour above a certain dataset volume
Snap ML: A Hierarchical Framework for Machine Learning
We describe a new software framework for fast training of generalized linear
models. The framework, named Snap Machine Learning (Snap ML), combines recent
advances in machine learning systems and algorithms in a nested manner to
reflect the hierarchical architecture of modern computing systems. We prove
theoretically that such a hierarchical system can accelerate training in
distributed environments where intra-node communication is cheaper than
inter-node communication. Additionally, we provide a review of the
implementation of Snap ML in terms of GPU acceleration, pipelining,
communication patterns and software architecture, highlighting aspects that
were critical for achieving high performance. We evaluate the performance of
Snap ML in both single-node and multi-node environments, quantifying the
benefit of the hierarchical scheme and the data streaming functionality, and
comparing with other widely-used machine learning software frameworks. Finally,
we present a logistic regression benchmark on the Criteo Terabyte Click Logs
dataset and show that Snap ML achieves the same test loss an order of magnitude
faster than any of the previously reported results, including those obtained
using TensorFlow and scikit-learn.Comment: in Proceedings of the Thirty-Second Conference on Neural Information
Processing Systems (NeurIPS 2018
Asynchronous Complex Analytics in a Distributed Dataflow Architecture
Scalable distributed dataflow systems have recently experienced widespread
adoption, with commodity dataflow engines such as Hadoop and Spark, and even
commodity SQL engines routinely supporting increasingly sophisticated analytics
tasks (e.g., support vector machines, logistic regression, collaborative
filtering). However, these systems' synchronous (often Bulk Synchronous
Parallel) dataflow execution model is at odds with an increasingly important
trend in the machine learning community: the use of asynchrony via shared,
mutable state (i.e., data races) in convex programming tasks, which has---in a
single-node context---delivered noteworthy empirical performance gains and
inspired new research into asynchronous algorithms. In this work, we attempt to
bridge this gap by evaluating the use of lightweight, asynchronous state
transfer within a commodity dataflow engine. Specifically, we investigate the
use of asynchronous sideways information passing (ASIP) that presents
single-stage parallel iterators with a Volcano-like intra-operator iterator
that can be used for asynchronous information passing. We port two synchronous
convex programming algorithms, stochastic gradient descent and the alternating
direction method of multipliers (ADMM), to use ASIPs. We evaluate an
implementation of ASIPs within on Apache Spark that exhibits considerable
speedups as well as a rich set of performance trade-offs in the use of these
asynchronous algorithms
Project Beehive: A Hardware/Software Co-designed Stack for Runtime and Architectural Research
The end of Dennard scaling combined with stagnation in architectural and
compiler optimizations makes it challenging to achieve significant performance
deltas. Solutions based solely in hardware or software are no longer sufficient
to maintain the pace of improvements seen during the past few decades. In
hardware, the end of single-core scaling resulted in the proliferation of
multi-core system architectures, however this has forced complex parallel
programming techniques into the mainstream. To further exploit physical
resources, systems are becoming increasingly heterogeneous with specialized
computing elements and accelerators. Programming across a range of disparate
architectures requires a new level of abstraction that programming languages
will have to adapt to. In software, emerging complex applications, from domains
such as Big Data and computer vision, run on multi-layered software stacks
targeting hardware with a variety of constraints and resources. Hence,
optimizing for the power-performance (and resiliency) space requires
experimentation platforms that offer quick and easy prototyping of
hardware/software co-designed techniques. To that end, we present Project
Beehive: A Hardware/Software co-designed stack for runtime and architectural
research. Project Beehive utilizes various state-of-the-art software and
hardware components along with novel and extensible co-design techniques. The
objective of Project Beehive is to provide a modern platform for
experimentation on emerging applications, programming languages, compilers,
runtimes, and low-power heterogeneous many-core architectures in a full-system
co-designed manner.Comment: New version of this pape
- …