11,589 research outputs found
Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World
This report documents the program and the outcomes of GI-Dagstuhl Seminar
16394 "Software Performance Engineering in the DevOps World".
The seminar addressed the problem of performance-aware DevOps. Both, DevOps
and performance engineering have been growing trends over the past one to two
years, in no small part due to the rise in importance of identifying
performance anomalies in the operations (Ops) of cloud and big data systems and
feeding these back to the development (Dev). However, so far, the research
community has treated software engineering, performance engineering, and cloud
computing mostly as individual research areas. We aimed to identify
cross-community collaboration, and to set the path for long-lasting
collaborations towards performance-aware DevOps.
The main goal of the seminar was to bring together young researchers (PhD
students in a later stage of their PhD, as well as PostDocs or Junior
Professors) in the areas of (i) software engineering, (ii) performance
engineering, and (iii) cloud computing and big data to present their current
research projects, to exchange experience and expertise, to discuss research
challenges, and to develop ideas for future collaborations
Sig-Wasserstein GANs for Time Series Generation
Synthetic data is an emerging technology that can significantly accelerate the development and
deployment of AI machine learning pipelines. In this work, we develop high-fidelity time-series
generators, the SigWGAN, by combining continuous-time stochastic models with the newly proposed
signature W1 metric. The former are the Logsig-RNN models based on the stochastic differential
equations, whereas the latter originates from the universal and principled mathematical features
to characterize the measure induced by time series. SigWGAN allows turning computationally
challenging GAN min-max problem into supervised learning while generating high fidelity samples.
We validate the proposed model on both synthetic data generated by popular quantitative risk models
and empirical financial data. Codes are available at https://github.com/SigCGANs/Sig-WassersteinGANs.gi
Sig-Wasserstein GANs for Time Series Generation
Synthetic data is an emerging technology that can significantly accelerate the development and
deployment of AI machine learning pipelines. In this work, we develop high-fidelity time-series
generators, the SigWGAN, by combining continuous-time stochastic models with the newly proposed
signature W1 metric. The former are the Logsig-RNN models based on the stochastic differential
equations, whereas the latter originates from the universal and principled mathematical features
to characterize the measure induced by time series. SigWGAN allows turning computationally
challenging GAN min-max problem into supervised learning while generating high fidelity samples.
We validate the proposed model on both synthetic data generated by popular quantitative risk models
and empirical financial data. Codes are available at https://github.com/SigCGANs/Sig-WassersteinGANs.gi
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
An ADMM Based Framework for AutoML Pipeline Configuration
We study the AutoML problem of automatically configuring machine learning
pipelines by jointly selecting algorithms and their appropriate
hyper-parameters for all steps in supervised learning pipelines. This black-box
(gradient-free) optimization with mixed integer & continuous variables is a
challenging problem. We propose a novel AutoML scheme by leveraging the
alternating direction method of multipliers (ADMM). The proposed framework is
able to (i) decompose the optimization problem into easier sub-problems that
have a reduced number of variables and circumvent the challenge of mixed
variable categories, and (ii) incorporate black-box constraints along-side the
black-box optimization objective. We empirically evaluate the flexibility (in
utilizing existing AutoML techniques), effectiveness (against open source
AutoML toolkits),and unique capability (of executing AutoML with practically
motivated black-box constraints) of our proposed scheme on a collection of
binary classification data sets from UCI ML& OpenML repositories. We observe
that on an average our framework provides significant gains in comparison to
other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical
advantages of this framework
- …