112,169 research outputs found
An empirical learning-based validation procedure for simulation workflow
Simulation workflow is a top-level model for the design and control of
simulation process. It connects multiple simulation components with time and
interaction restrictions to form a complete simulation system. Before the
construction and evaluation of the component models, the validation of
upper-layer simulation workflow is of the most importance in a simulation
system. However, the methods especially for validating simulation workflow is
very limit. Many of the existing validation techniques are domain-dependent
with cumbersome questionnaire design and expert scoring. Therefore, this paper
present an empirical learning-based validation procedure to implement a
semi-automated evaluation for simulation workflow. First, representative
features of general simulation workflow and their relations with validation
indices are proposed. The calculation process of workflow credibility based on
Analytic Hierarchy Process (AHP) is then introduced. In order to make full use
of the historical data and implement more efficient validation, four learning
algorithms, including back propagation neural network (BPNN), extreme learning
machine (ELM), evolving new-neuron (eNFN) and fast incremental gaussian mixture
model (FIGMN), are introduced for constructing the empirical relation between
the workflow credibility and its features. A case study on a landing-process
simulation workflow is established to test the feasibility of the proposed
procedure. The experimental results also provide some useful overview of the
state-of-the-art learning algorithms on the credibility evaluation of
simulation models
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach
Many algorithms in workflow scheduling and resource provisioning rely on the
performance estimation of tasks to produce a scheduling plan. A profiler that
is capable of modeling the execution of tasks and predicting their runtime
accurately, therefore, becomes an essential part of any Workflow Management
System (WMS). With the emergence of multi-tenant Workflow as a Service (WaaS)
platforms that use clouds for deploying scientific workflows, task runtime
prediction becomes more challenging because it requires the processing of a
significant amount of data in a near real-time scenario while dealing with the
performance variability of cloud resources. Hence, relying on methods such as
profiling tasks' execution data using basic statistical description (e.g.,
mean, standard deviation) or batch offline regression techniques to estimate
the runtime may not be suitable for such environments. In this paper, we
propose an online incremental learning approach to predict the runtime of tasks
in scientific workflows in clouds. To improve the performance of the
predictions, we harness fine-grained resources monitoring data in the form of
time-series records of CPU utilization, memory usage, and I/O activities that
are reflecting the unique characteristics of a task's execution. We compare our
solution to a state-of-the-art approach that exploits the resources monitoring
data based on regression machine learning technique. From our experiments, the
proposed strategy improves the performance, in terms of the error, up to
29.89%, compared to the state-of-the-art solutions.Comment: Accepted for presentation at main conference track of 11th IEEE/ACM
International Conference on Utility and Cloud Computin
ICT in Czech companies: business efficiency potentials to be achieved.
The paper deals with business potential analysis based on the data published by Czech Statistic Authority (SÚ). It shows that the infrastructure state of the art even in small Czech companies enables to expand ERP and CRM systems, trading over Internet, Supply Chain Management and other new trends. Internet security is here of greatest importance, however it cannot be seen as major obstacle for new trading methods. The greatest challenge identified is the process and workflow optimization. To streamline workflow the document management supporting nearly seamless integration crossover the functional areas is of greatest importance. Moreover, process optimization can run into difficulties due to cross-organization functionalities of new IT architecture concepts like Service Oriented Architecture, WEB2 concepts and other methods and means. In this paper the value flow approach is shortly mentioned as an alternative to process modeling and workflow approach. Value oriented methods can overcome the process oriented approach limitations.ICT infrastructure; Business processes; Process modeling; Document management; Value chains; Business semantics
Dividing the Ontology Alignment Task with Semantic Embeddings and Logic-based Modules
Large ontologies still pose serious challenges to state-of-the-art ontology alignment systems. In this paper we present an approach that combines a neural embedding model and logic-based modules to accurately divide an input ontology matching task into smaller and more tractable matching (sub)tasks. We have conducted a comprehensive evaluation using the datasets of the Ontology Alignment Evaluation Initiative. The results are encouraging and suggest that the proposed method is adequate in practice and can be integrated within the workflow of systems unable to cope with very large ontologies
Portability of Scientific Workflows in NGS Data Analysis: A Case Study
The analysis of next-generation sequencing (NGS) data requires complex
computational workflows consisting of dozens of autonomously developed yet
interdependent processing steps. Whenever large amounts of data need to be
processed, these workflows must be executed on a parallel and/or distributed
systems to ensure reasonable runtime. Porting a workflow developed for a
particular system on a particular hardware infrastructure to another system or
to another infrastructure is non-trivial, which poses a major impediment to the
scientific necessities of workflow reproducibility and workflow reusability. In
this work, we describe our efforts to port a state-of-the-art workflow for the
detection of specific variants in whole-exome sequencing of mice. The workflow
originally was developed in the scientific workflow system snakemake for
execution on a high-performance cluster controlled by Sun Grid Engine. In the
project, we ported it to the scientific workflow system SaasFee that can
execute workflows on (multi-core) stand-alone servers or on clusters of
arbitrary sizes using the Hadoop. The purpose of this port was that also owners
of low-cost hardware infrastructures, for which Hadoop was made for, become
able to use the workflow. Although both the source and the target system are
called scientific workflow systems, they differ in numerous aspects, ranging
from the workflow languages to the scheduling mechanisms and the file access
interfaces. These differences resulted in various problems, some expected and
more unexpected, that had to be resolved before the workflow could be run with
equal semantics. As a side-effect, we also report cost/runtime ratios for a
state-of-the-art NGS workflow on very different hardware platforms: A
comparably cheap stand-alone server (80 threads), a mid-cost, mid-sized cluster
(552 threads), and a high-end HPC system (3784 threads)
- …