16,475 research outputs found
Predicting Intermediate Storage Performance for Workflow Applications
Configuring a storage system to better serve an application is a challenging
task complicated by a multidimensional, discrete configuration space and the
high cost of space exploration (e.g., by running the application with different
storage configurations). To enable selecting the best configuration in a
reasonable time, we design an end-to-end performance prediction mechanism that
estimates the turn-around time of an application using storage system under a
given configuration. This approach focuses on a generic object-based storage
system design, supports exploring the impact of optimizations targeting
workflow applications (e.g., various data placement schemes) in addition to
other, more traditional, configuration knobs (e.g., stripe size or replication
level), and models the system operation at data-chunk and control message
level.
This paper presents our experience to date with designing and using this
prediction mechanism. We evaluate this mechanism using micro- as well as
synthetic benchmarks mimicking real workflow applications, and a real
application.. A preliminary evaluation shows that we are on a good track to
meet our objectives: it can scale to model a workflow application run on an
entire cluster while offering an over 200x speedup factor (normalized by
resource) compared to running the actual application, and can achieve, in the
limited number of scenarios we study, a prediction accuracy that enables
identifying the best storage system configuration
Workflow Partitioning and Deployment on the Cloud using Orchestra
Orchestrating service-oriented workflows is typically based on a design model
that routes both data and control through a single point - the centralised
workflow engine. This causes scalability problems that include the unnecessary
consumption of the network bandwidth, high latency in transmitting data between
the services, and performance bottlenecks. These problems are highly prominent
when orchestrating workflows that are composed from services dispersed across
distant geographical locations. This paper presents a novel workflow
partitioning approach, which attempts to improve the scalability of
orchestrating large-scale workflows. It permits the workflow computation to be
moved towards the services providing the data in order to garner optimal
performance results. This is achieved by decomposing the workflow into smaller
sub workflows for parallel execution, and determining the most appropriate
network locations to which these sub workflows are transmitted and subsequently
executed. This paper demonstrates the efficiency of our approach using a set of
experimental workflows that are orchestrated over Amazon EC2 and across several
geographic network regions.Comment: To appear in Proceedings of the IEEE/ACM 7th International Conference
on Utility and Cloud Computing (UCC 2014
A Pipeline for Volume Electron Microscopy of the Caenorhabditis elegans Nervous System.
The "connectome," a comprehensive wiring diagram of synaptic connectivity, is achieved through volume electron microscopy (vEM) analysis of an entire nervous system and all associated non-neuronal tissues. White et al. (1986) pioneered the fully manual reconstruction of a connectome using Caenorhabditis elegans. Recent advances in vEM allow mapping new C. elegans connectomes with increased throughput, and reduced subjectivity. Current vEM studies aim to not only fill the remaining gaps in the original connectome, but also address fundamental questions including how the connectome changes during development, the nature of individuality, sexual dimorphism, and how genetic and environmental factors regulate connectivity. Here we describe our current vEM pipeline and projected improvements for the study of the C. elegans nervous system and beyond
Multi-criteria scheduling of pipeline workflows
Mapping workflow applications onto parallel platforms is a challenging
problem, even for simple application patterns such as pipeline graphs. Several
antagonist criteria should be optimized, such as throughput and latency (or a
combination). In this paper, we study the complexity of the bi-criteria mapping
problem for pipeline graphs on communication homogeneous platforms. In
particular, we assess the complexity of the well-known chains-to-chains problem
for different-speed processors, which turns out to be NP-hard. We provide
several efficient polynomial bi-criteria heuristics, and their relative
performance is evaluated through extensive simulations
Adaptive Segmentation of Knee Radiographs for Selecting the Optimal ROI in Texture Analysis
The purposes of this study were to investigate: 1) the effect of placement of
region-of-interest (ROI) for texture analysis of subchondral bone in knee
radiographs, and 2) the ability of several texture descriptors to distinguish
between the knees with and without radiographic osteoarthritis (OA). Bilateral
posterior-anterior knee radiographs were analyzed from the baseline of OAI and
MOST datasets. A fully automatic method to locate the most informative region
from subchondral bone using adaptive segmentation was developed. We used an
oversegmentation strategy for partitioning knee images into the compact regions
that follow natural texture boundaries. LBP, Fractal Dimension (FD), Haralick
features, Shannon entropy, and HOG methods were computed within the standard
ROI and within the proposed adaptive ROIs. Subsequently, we built logistic
regression models to identify and compare the performances of each texture
descriptor and each ROI placement method using 5-fold cross validation setting.
Importantly, we also investigated the generalizability of our approach by
training the models on OAI and testing them on MOST dataset.We used area under
the receiver operating characteristic (ROC) curve (AUC) and average precision
(AP) obtained from the precision-recall (PR) curve to compare the results. We
found that the adaptive ROI improves the classification performance (OA vs.
non-OA) over the commonly used standard ROI (up to 9% percent increase in AUC).
We also observed that, from all texture parameters, LBP yielded the best
performance in all settings with the best AUC of 0.840 [0.825, 0.852] and
associated AP of 0.804 [0.786, 0.820]. Compared to the current state-of-the-art
approaches, our results suggest that the proposed adaptive ROI approach in
texture analysis of subchondral bone can increase the diagnostic performance
for detecting the presence of radiographic OA
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
- …