16,475 research outputs found

    Predicting Intermediate Storage Performance for Workflow Applications

    Full text link
    Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different storage configurations). To enable selecting the best configuration in a reasonable time, we design an end-to-end performance prediction mechanism that estimates the turn-around time of an application using storage system under a given configuration. This approach focuses on a generic object-based storage system design, supports exploring the impact of optimizations targeting workflow applications (e.g., various data placement schemes) in addition to other, more traditional, configuration knobs (e.g., stripe size or replication level), and models the system operation at data-chunk and control message level. This paper presents our experience to date with designing and using this prediction mechanism. We evaluate this mechanism using micro- as well as synthetic benchmarks mimicking real workflow applications, and a real application.. A preliminary evaluation shows that we are on a good track to meet our objectives: it can scale to model a workflow application run on an entire cluster while offering an over 200x speedup factor (normalized by resource) compared to running the actual application, and can achieve, in the limited number of scenarios we study, a prediction accuracy that enables identifying the best storage system configuration

    Workflow Partitioning and Deployment on the Cloud using Orchestra

    Get PDF
    Orchestrating service-oriented workflows is typically based on a design model that routes both data and control through a single point - the centralised workflow engine. This causes scalability problems that include the unnecessary consumption of the network bandwidth, high latency in transmitting data between the services, and performance bottlenecks. These problems are highly prominent when orchestrating workflows that are composed from services dispersed across distant geographical locations. This paper presents a novel workflow partitioning approach, which attempts to improve the scalability of orchestrating large-scale workflows. It permits the workflow computation to be moved towards the services providing the data in order to garner optimal performance results. This is achieved by decomposing the workflow into smaller sub workflows for parallel execution, and determining the most appropriate network locations to which these sub workflows are transmitted and subsequently executed. This paper demonstrates the efficiency of our approach using a set of experimental workflows that are orchestrated over Amazon EC2 and across several geographic network regions.Comment: To appear in Proceedings of the IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC 2014

    A Pipeline for Volume Electron Microscopy of the Caenorhabditis elegans Nervous System.

    Get PDF
    The "connectome," a comprehensive wiring diagram of synaptic connectivity, is achieved through volume electron microscopy (vEM) analysis of an entire nervous system and all associated non-neuronal tissues. White et al. (1986) pioneered the fully manual reconstruction of a connectome using Caenorhabditis elegans. Recent advances in vEM allow mapping new C. elegans connectomes with increased throughput, and reduced subjectivity. Current vEM studies aim to not only fill the remaining gaps in the original connectome, but also address fundamental questions including how the connectome changes during development, the nature of individuality, sexual dimorphism, and how genetic and environmental factors regulate connectivity. Here we describe our current vEM pipeline and projected improvements for the study of the C. elegans nervous system and beyond

    Multi-criteria scheduling of pipeline workflows

    Get PDF
    Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonist criteria should be optimized, such as throughput and latency (or a combination). In this paper, we study the complexity of the bi-criteria mapping problem for pipeline graphs on communication homogeneous platforms. In particular, we assess the complexity of the well-known chains-to-chains problem for different-speed processors, which turns out to be NP-hard. We provide several efficient polynomial bi-criteria heuristics, and their relative performance is evaluated through extensive simulations

    Adaptive Segmentation of Knee Radiographs for Selecting the Optimal ROI in Texture Analysis

    Full text link
    The purposes of this study were to investigate: 1) the effect of placement of region-of-interest (ROI) for texture analysis of subchondral bone in knee radiographs, and 2) the ability of several texture descriptors to distinguish between the knees with and without radiographic osteoarthritis (OA). Bilateral posterior-anterior knee radiographs were analyzed from the baseline of OAI and MOST datasets. A fully automatic method to locate the most informative region from subchondral bone using adaptive segmentation was developed. We used an oversegmentation strategy for partitioning knee images into the compact regions that follow natural texture boundaries. LBP, Fractal Dimension (FD), Haralick features, Shannon entropy, and HOG methods were computed within the standard ROI and within the proposed adaptive ROIs. Subsequently, we built logistic regression models to identify and compare the performances of each texture descriptor and each ROI placement method using 5-fold cross validation setting. Importantly, we also investigated the generalizability of our approach by training the models on OAI and testing them on MOST dataset.We used area under the receiver operating characteristic (ROC) curve (AUC) and average precision (AP) obtained from the precision-recall (PR) curve to compare the results. We found that the adaptive ROI improves the classification performance (OA vs. non-OA) over the commonly used standard ROI (up to 9% percent increase in AUC). We also observed that, from all texture parameters, LBP yielded the best performance in all settings with the best AUC of 0.840 [0.825, 0.852] and associated AP of 0.804 [0.786, 0.820]. Compared to the current state-of-the-art approaches, our results suggest that the proposed adaptive ROI approach in texture analysis of subchondral bone can increase the diagnostic performance for detecting the presence of radiographic OA

    DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

    Full text link
    The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both data sets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry data sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and Computin
    • …
    corecore