27,868 research outputs found
Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters
To improve customer experience, datacenter operators offer support for
simplifying application and resource management. For example, running workloads
of workflows on behalf of customers is desirable, but requires increasingly
more sophisticated autoscaling policies, that is, policies that dynamically
provision resources for the customer. Although selecting and tuning autoscaling
policies is a challenging task for datacenter operators, so far relatively few
studies investigate the performance of autoscaling for workloads of workflows.
Complementing previous knowledge, in this work we propose the first
comprehensive performance study in the field. Using trace-based simulation, we
compare state-of-the-art autoscaling policies across multiple application
domains, workload arrival patterns (e.g., burstiness), and system utilization
levels. We further investigate the interplay between autoscaling and regular
allocation policies, and the complexity cost of autoscaling. Our quantitative
study focuses not only on traditional performance metrics and on
state-of-the-art elasticity metrics, but also on time- and memory-related
autoscaling-complexity metrics. Our main results give strong and quantitative
evidence about previously unreported operational behavior, for example, that
autoscaling policies perform differently across application domains and by how
much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based
Performance Study of Autoscaling Workloads of Workflows in Datacenters
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Many scientific problems require multiple distinct computational tasks to be
executed in order to achieve a desired solution. We introduce the Ensemble
Toolkit (EnTK) to address the challenges of scale, diversity and reliability
they pose. We describe the design and implementation of EnTK, characterize its
performance and integrate it with two distinct exemplar use cases: seismic
inversion and adaptive analog ensembles. We perform nine experiments,
characterizing EnTK overheads, strong and weak scalability, and the performance
of two use case implementations, at scale and on production infrastructures. We
show how EnTK meets the following general requirements: (i) implementing
dedicated abstractions to support the description and execution of ensemble
applications; (ii) support for execution on heterogeneous computing
infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv)
fault tolerance. We discuss novel computational capabilities that EnTK enables
and the scientific advantages arising thereof. We propose EnTK as an important
addition to the suite of tools in support of production scientific computing
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
- …