1,343 research outputs found
Towards workflow ecosystems through standard representations
Workflows are increasingly used to manage and share scientific
computations and methods. Workflow tools can be used to design,
validate, execute and visualize scientific workflows and their
execution results. Other tools manage workflow libraries or mine
their contents. There has been a lot of recent work on workflow
system integration as well as common workflow interlinguas, but
the interoperability among workflow systems remains a challenge.
Ideally, these tools would form a workflow ecosystem such that it
should be possible to create a workflow with a tool, execute it
with another, visualize it with another, and use yet another tool to
mine a repository of such workflows or their executions. In this
paper, we describe our approach to create a workflow ecosystem
through the use of standard models for provenance (OPM and
W3C PROV) and extensions (P-PLAN and OPMW) to represent
workflows. The ecosystem integrates different workflow tools
with diverse functions (workflow generation, execution,
browsing, mining, and visualization) created by a variety of
research groups. This is, to our knowledge, the first time that such
a variety of workflow systems and functions are integrated
Laminar: A New Serverless Stream-based Framework with Semantic Code Search and Code Completion
This paper introduces Laminar, a novel serverless framework based on
dispel4py, a parallel stream-based dataflow library. Laminar efficiently
manages streaming workflows and components through a dedicated registry,
offering a seamless serverless experience. Leveraging large lenguage models,
Laminar enhances the framework with semantic code search, code summarization,
and code completion. This contribution enhances serverless computing by
simplifying the execution of streaming computations, managing data streams more
efficiently, and offering a valuable tool for both researchers and
practitioners.Comment: 13 pages, 10 Figures, 6 Table
A Query Integrator and Manager for the Query Web
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions
On the construction of decentralised service-oriented orchestration systems
Modern science relies on workflow technology to capture, process, and analyse data obtained from scientific instruments. Scientific workflows are precise descriptions of experiments in which multiple computational tasks are coordinated based on the dataflows between them. Orchestrating scientific workflows presents a significant research challenge: they are typically executed in a manner such that all data pass through a centralised computer server known as the engine, which causes unnecessary network traffic that leads to a performance bottleneck. These workflows are commonly composed of services that perform computation over geographically distributed resources, and involve the management of dataflows between them. Centralised orchestration is clearly not a scalable approach for coordinating services dispersed across distant geographical locations. This thesis presents a scalable decentralised service-oriented orchestration system that relies on a high-level data coordination language for the specification and execution of workflows. This system’s architecture consists of distributed engines, each of which is responsible for executing part of the overall workflow. It exploits parallelism in the workflow by decomposing it into smaller sub-workflows, and determines the most appropriate engines to execute them using computation placement analysis. This permits the workflow logic to be distributed closer to the services providing the data for execution, which reduces the overall data transfer in the workflow and improves its execution time. This thesis provides an evaluation of the presented system which concludes that decentralised orchestration provides scalability benefits over centralised orchestration, and improves the overall performance of executing a service-oriented workflow
- …