25,577 research outputs found
Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World
This report documents the program and the outcomes of GI-Dagstuhl Seminar
16394 "Software Performance Engineering in the DevOps World".
The seminar addressed the problem of performance-aware DevOps. Both, DevOps
and performance engineering have been growing trends over the past one to two
years, in no small part due to the rise in importance of identifying
performance anomalies in the operations (Ops) of cloud and big data systems and
feeding these back to the development (Dev). However, so far, the research
community has treated software engineering, performance engineering, and cloud
computing mostly as individual research areas. We aimed to identify
cross-community collaboration, and to set the path for long-lasting
collaborations towards performance-aware DevOps.
The main goal of the seminar was to bring together young researchers (PhD
students in a later stage of their PhD, as well as PostDocs or Junior
Professors) in the areas of (i) software engineering, (ii) performance
engineering, and (iii) cloud computing and big data to present their current
research projects, to exchange experience and expertise, to discuss research
challenges, and to develop ideas for future collaborations
The Dark Energy Survey Data Management System
The Dark Energy Survey collaboration will study cosmic acceleration with a
5000 deg2 griZY survey in the southern sky over 525 nights from 2011-2016. The
DES data management (DESDM) system will be used to process and archive these
data and the resulting science ready data products. The DESDM system consists
of an integrated archive, a processing framework, an ensemble of astronomy
codes and a data access framework. We are developing the DESDM system for
operation in the high performance computing (HPC) environments at NCSA and
Fermilab. Operating the DESDM system in an HPC environment offers both speed
and flexibility. We will employ it for our regular nightly processing needs,
and for more compute-intensive tasks such as large scale image coaddition
campaigns, extraction of weak lensing shear from the full survey dataset, and
massive seasonal reprocessing of the DES data. Data products will be available
to the Collaboration and later to the public through a virtual-observatory
compatible web portal. Our approach leverages investments in publicly available
HPC systems, greatly reducing hardware and maintenance costs to the project,
which must deploy and maintain only the storage, database platforms and
orchestration and web portal nodes that are specific to DESDM. In Fall 2007, we
tested the current DESDM system on both simulated and real survey data. We used
Teragrid to process 10 simulated DES nights (3TB of raw data), ingesting and
calibrating approximately 250 million objects into the DES Archive database. We
also used DESDM to process and calibrate over 50 nights of survey data acquired
with the Mosaic2 camera. Comparison to truth tables in the case of the
simulated data and internal crosschecks in the case of the real data indicate
that astrometric and photometric data quality is excellent.Comment: To be published in the proceedings of the SPIE conference on
Astronomical Instrumentation (held in Marseille in June 2008). This preprint
is made available with the permission of SPIE. Further information together
with preprint containing full quality images is available at
http://desweb.cosmology.uiuc.edu/wik
OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF
The actor model of computation has been designed for a seamless support of
concurrency and distribution. However, it remains unspecific about data
parallel program flows, while available processing power of modern many core
hardware such as graphics processing units (GPUs) or coprocessors increases the
relevance of data parallelism for general-purpose computation.
In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework
(CAF). This offers a high level interface for accessing any OpenCL device
without leaving the actor paradigm. The new type of actor is integrated into
the runtime environment of CAF and gives rise to transparent message passing in
distributed systems on heterogeneous hardware. Following the actor logic in
CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence
operate in a multi-stage fashion on data resident at the GPU. Developers are
thus enabled to build complex data parallel programs from primitives without
leaving the actor paradigm, nor sacrificing performance. Our evaluations on
commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear
scaling behavior when offloading larger workloads. For sub-second duties, the
efficiency of offloading was found to largely differ between devices. Moreover,
our findings indicate a negligible overhead over programming with the native
OpenCL API.Comment: 28 page
Optimizing Splicing Junction Detection in Next Generation Sequencing Data on a Virtual-GRID Infrastructure
The new protocol for sequencing the messenger RNA in a cell, named RNA-seq produce millions of short sequence fragments. Next Generation Sequencing technology allows more accurate analysis but increase needs in term of computational resources. This paper describes the optimization of a RNA-seq analysis pipeline devoted to splicing variants detection, aimed at reducing computation time and providing a multi-user/multisample environment. This work brings two main contributions. First, we optimized a well-known algorithm called TopHat by parallelizing some sequential mapping steps. Second, we designed and implemented a hybrid virtual GRID infrastructure allowing to efficiently execute multiple instances of TopHat running on different samples or on behalf of different users, thus optimizing the overall execution time and enabling a flexible multi-user environmen
- …