Search CORE

67,376 research outputs found

Workflow Similarity Analysis

Author: Krzywucki Michał
Polak Stanisław
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

As distributed systems have emerged they become widely adopted by scientific communities and various commercial vendors. Some of them settled architectural models that today are broadly known as grid computing and service oriented architecture. Both of them are base on service and workflow paradigms as a single shareable unit of work. Unfortunately, an increasing number of workflows being brought out has raised a problem of their distribution and management, e.g., search repositories and find workflows similar to the given one in order to increase the efficiency of calculations. In this paper a similar workflow search algorithm based on semantic type comparison has been proposed. In order to evaluate the algorithm usability and precision, an experiment has been conducted that regarded workflow extraction from Feta repository. The entire process involved reasoning based on myGrid ontologies. The received results were compared to results obtained from other algorithm, based on an analysis of the names of the workflow components using the TD-IDF weight. The described experiment shows that semantics and ontology play significant role in service and workflow representation

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Effective and Efficient Similarity Search in Scientific Workflow Repositories

Author: Cohen-Boulakia Sarah
Davidson Susan
Khanna Sanjeev
Leser Ulf
Starlinger Johannes
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

International audienceScientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate worflkow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for similarity search. Effective similarity search requires both high quality algorithms for the comparison of scientific workflows and efficient strategies for indexing, searching, and ranking of search results. Yet, the graph structure of scientific workflows poses severe challenges to each of these steps. Here, we present a complete system for effective and efficient similarity search in scientific workflow repositories, based on the Layer Decompositon approach to scientific workflow comparison. Layer Decompositon specifically accounts for the directed dataflow underlying scientific workflows and, compared to other state-of-the-art methods, delivers best results for similarity search at comparably low runtimes. Stacking Layer Decomposition with even faster, structure-agnostic approaches allows us to use proven, off-the-shelf tools for workflow indexing to further reduce runtimes and scale similarity search to sizes of current repositories

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

HAL-CIRAD

HAL-Rennes 1

Layer Decomposition: An Effective Structure-based Approach for Scientific Workflow Similarity

Author: Cohen-Boulakia Sarah
Davidson Susan
Khanna Sanjeev
Leser Ulf
Starlinger Johannes
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/10/2014
Field of study

International audienceScientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate workflow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for effective similarity search. Here, we present a novel and intuitive workflow similarity measure that is based on layer decomposition. Layer decomposition accounts for the directed dataflow underlying scientific workflows, a property which has not been adequately considered in previous methods. We comparatively evaluate our algorithm using a gold standard for 24 query workflows from a repository of almost 1500 scientific workflows, and show that it a) delivers the best results for similarity search, b) has a much lower runtime than other, often highly complex competitors in structure-aware workflow comparison, and c) can be stacked easily with even faster, structure-agnostic approaches to further reduce runtime while retaining result quality

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

RDF Curator: A Novel Workflow that Generates Semantic Graph from Literature for Curation Using Text Mining

Author: Osamu Gotoh
Yusuke Komiyama
Publication venue
Publication date: 01/01/2010
Field of study

There exist few databases that enable cross-reference among various research fields related to bioenergy. Cross-reference is highly desired among bioinformatics databases related to environment, energy, and agriculture for better mutual cooperation. By uniting Semantic Graph, we can economically construct a distributed database, regardless of the size of research laboratories and research endeavors.

Our purpose is to design and develop a workflow based on RDF (Resource Description Framework) that generates Semantic Graph for a set of technical terms extracted from documents of various formats, such as PDF, HTML, and plain text. Our attempt is to generate Semantics Graph as a result of text mining including morphological analysis and syntax analysis.

We have developed a prototype of workflow program named "RDF Curator". By using this system, various types of documents can be automatically converted into RDF. "RDF Curator" is composed of general tools and libraries so that no special environment is needed. Hence, “RDF Curator” can be used on many platforms, such as MacOSX, Linux, and Windows (Cygwin). We expect that our system can assist human curators in constructing Semantic Graph. Although fast and high throughput, the accuracy of the present version of "RDF Curator" is lower than that of human curators. As a future task, we have to improve the accuracy of the workflow. In addition, we also plan to apply our system to analysis of network similarity

Crossref

Nature Precedings

Recommended from our members

Clustering Trajectories by Relevant Parts for Air Traffic Analysis

Author: Andrienko G.
Andrienko N.
Cordero Garcia J. M.
Fuchs G.
Publication venue: IEEE
Publication date: 29/08/2017
Field of study

Clustering of trajectories of moving objects by similarity is an important technique in movement analysis. Existing distance functions assess the similarity between trajectories based on properties of the trajectory points or segments. The properties may include the spatial positions, times, and thematic attributes. There may be a need to focus the analysis on certain parts of trajectories, i.e., points and segments that have particular properties. According to the analysis focus, the analyst may need to cluster trajectories by similarity of their relevant parts only. Throughout the analysis process, the focus may change, and different parts of trajectories may become relevant. We propose an analytical workflow in which interactive filtering tools are used to attach relevance flags to elements of trajectories, clustering is done using a distance function that ignores irrelevant elements, and the resulting clusters are summarized for further analysis. We demonstrate how this workflow can be useful for different analysis tasks in three case studies with real data from the domain of air traffic. We propose a suite of generic techniques and visualization guidelines to support movement data analysis by means of relevance-aware trajectory clustering

City Research Online

ZENODO

Fraunhofer-ePrints

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Piloting an Empirical Study on Measures for Workflow Similarity

Author: Rozie M.
Wombacher A.
Publication venue: IEEE Press
Publication date: 01/01/2006
Field of study

Service discovery of state dependent services has to take workflow aspects into account. To increase the usability of a service discovery, the result list of services should be ordered with regard to the relevance of the services. Means of ordering a list of workflows due to their similarity with regard to a query are missing. This paper presents a pilot of an empirical study on the influence of different measures on workflow similarity. It turns out that, although preliminary, relations between different measures are indicated and that a similarity definition depends on the application scenario in which the service discovery is applied

University of Twente Research Information

Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining

Author: Hilario Melanie
Kalousis Alexandros
Nguyen Phong
Wang Jun
Publication venue
Publication date: 04/10/2012
Field of study

The notion of meta-mining has appeared recently and extends the traditional meta-learning in two ways. First it does not learn meta-models that provide support only for the learning algorithm selection task but ones that support the whole data-mining process. In addition it abandons the so called black-box approach to algorithm description followed in meta-learning. Now in addition to the datasets, algorithms also have descriptors, workflows as well. For the latter two these descriptions are semantic, describing properties of the algorithms. With the availability of descriptors both for datasets and data mining workflows the traditional modelling techniques followed in meta-learning, typically based on classification and regression algorithms, are no longer appropriate. Instead we are faced with a problem the nature of which is much more similar to the problems that appear in recommendation systems. The most important meta-mining requirements are that suggestions should use only datasets and workflows descriptors and the cold-start problem, e.g. providing workflow suggestions for new datasets. In this paper we take a different view on the meta-mining modelling problem and treat it as a recommender problem. In order to account for the meta-mining specificities we derive a novel metric-based-learning recommender approach. Our method learns two homogeneous metrics, one in the dataset and one in the workflow space, and a heterogeneous one in the dataset-workflow space. All learned metrics reflect similarities established from the dataset-workflow preference matrix. We demonstrate our method on meta-mining over biological (microarray datasets) problems. The application of our method is not limited to the meta-mining problem, its formulations is general enough so that it can be applied on problems with similar requirements

arXiv.org e-Print Archive

Crossref

RERO DOC Digital Library

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Author: Barbosa Helio J. C.
Foster Ian
Gadelha Jr Luiz M. R.
Katz Daniel S.
Loss Guilherme
Magalhães Thiago
Mattoso Marta
Mondelli Maria Luiza
Ocaña Kary
Vasconcelos Ana Tereza R.
Wilde Michael
Publication venue: 'PeerJ'
Publication date: 11/01/2018
Field of study

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

arXiv.org e-Print Archive

Directory of Open Access Journals