54,362 research outputs found
Optimizing distributed data stream processing by tracing
Heterogeneous mobile, sensor, IoT, smart environment, and social networking applications have recently started to produce unbounded, fast, and massive-scale streams of data that have to be processed “on the fly”. Systems that process such data have to be enhanced with detection for operational exceptions and with triggers for both automated and manual operator actions. In this paper, we illustrate how tracing in distributed data processing systems can be applied to detecting changes in data and operational environment to maintain the efficiency of heterogeneous data stream processing systems under potentially changing data quality and distribution. By the tracing of individual input records, we can (1) identify outliers in a web crawling and document processing system and use the insights to define URL filtering rules; (2) identify heavy keys, such as NULL, that should be filtered before processing; (3) give hints to improve the key-based partitioning mechanisms; and (4) measure the limits of overpartitioning if heavy thread-unsafe libraries are imported.
By using Apache Spark as illustration, we show how various data stream processing efficiency issues can be mitigated or optimized by our distributed tracing engine. We describe and qualitatively compare two different designs, one based on reporting to a distributed database and another based on trace piggybacking. Our prototype implementation consists of wrappers suitable for JVM environments in general, with minimal impact on the source code of the core system. Our tracing framework is the first to solve tracing in multiple systems across boundaries and to provide detailed performance measurements suitable for automated optimization, not just debugging
Characterizing Deep-Learning I/O Workloads in TensorFlow
The performance of Deep-Learning (DL) computing frameworks rely on the
performance of data ingestion and checkpointing. In fact, during the training,
a considerable high number of relatively small files are first loaded and
pre-processed on CPUs and then moved to accelerator for computation. In
addition, checkpointing and restart operations are carried out to allow DL
computing frameworks to restart quickly from a checkpoint. Because of this, I/O
affects the performance of DL applications. In this work, we characterize the
I/O performance and scaling of TensorFlow, an open-source programming framework
developed by Google and specifically designed for solving DL problems. To
measure TensorFlow I/O performance, we first design a micro-benchmark to
measure TensorFlow reads, and then use a TensorFlow mini-application based on
AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow.
To improve the checkpointing performance, we design and implement a burst
buffer. We find that increasing the number of threads increases TensorFlow
bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use
of the tensorFlow prefetcher results in a complete overlap of computation on
accelerator and input pipeline on CPU eliminating the effective cost of I/O on
the overall performance. The use of a burst buffer to checkpoint to a fast
small capacity storage and copy asynchronously the checkpoints to a slower
large capacity storage resulted in a performance improvement of 2.6x with
respect to checkpointing directly to slower storage on our benchmark
environment.Comment: Accepted for publication at pdsw-DISCS 201
BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)
We describe a binding schema markup language (BSML) for describing data
interchange between scientific codes. Such a facility is an important
constituent of scientific problem solving environments (PSEs). BSML is designed
to integrate with a PSE or application composition system that views model
specification and execution as a problem of managing semistructured data. The
data interchange problem is addressed by three techniques for processing
semistructured data: validation, binding, and conversion. We present BSML and
describe its application to a PSE for wireless communications system design
- …