Search CORE

16,642 research outputs found

From Design to Production Control Through the Integration of Engineering Data Management and Workflow Management Systems

Author: Baker N.
Bazan A.
Bityukov S.
Chevenier G.
Estrella F.
Flour T. Le
Goff J-M. Le
Kovacs Z.
Lieunard S.
McClatchey R.
Murray S.
Organtini G.
Vialle J-P.
Publication venue
Publication date: 11/12/1997
Field of study

At a time when many companies are under pressure to reduce "times-to-market" the management of product information from the early stages of design through assembly to manufacture and production has become increasingly important. Similarly in the construction of high energy physics devices the collection of (often evolving) engineering data is central to the subsequent physics analysis. Traditionally in industry design engineers have employed Engineering Data Management Systems (also called Product Data Management Systems) to coordinate and control access to documented versions of product designs. However, these systems provide control only at the collaborative design level and are seldom used beyond design. Workflow management systems, on the other hand, are employed in industry to coordinate and support the more complex and repeatable work processes of the production environment. Commercial workflow products cannot support the highly dynamic activities found both in the design stages of product development and in rapidly evolving workflow definitions. The integration of Product Data Management with Workflow Management can provide support for product development from initial CAD/CAM collaborative design through to the support and optimisation of production workflow activities. This paper investigates this integration and proposes a philosophy for the support of product data throughout the full development and production lifecycle and demonstrates its usefulness in the construction of CMS detectors.Comment: 18 pages, 13 figure

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

Sciunits: Reusable Research Objects

Author: Fils Gabriel
Malik Tanu
That Dai Hai Ton
Yuan Zhihao
Publication venue
Publication date: 11/09/2017
Field of study

Science is conducted collaboratively, often requiring knowledge sharing about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. In this paper, we present the sciunit, a reusable research object in which aggregated content is recomputable. We describe a Git-like client that efficiently creates, stores, and repeats sciunits. We show through analysis that sciunits repeat computational experiments with minimal storage and processing overhead. Finally, we provide an overview of sharing and reproducible cyberinfrastructure based on sciunits gaining adoption in the domain of geosciences

arXiv.org e-Print Archive

Crossref

Recommended from our members

Kronos: a workflow assembler for genome analytics and informatics.

Author: Aniba Radhouane
Bashashati Ali
Boutros Paul C
Grande Bruno M
Grewal Diljot
Grewal Jasleen
Morin Ryan D
Rosner Jamie
Shah Sohrab P
Taghiyar M Jafar
Publication venue: eScholarship, University of California
Publication date: 01/07/2017
Field of study

BackgroundThe field of next-generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into "best practices" for automated analysis of next-generation sequencing datasets) still requires significant programming investment and expertise.ResultsWe present Kronos, a software platform for facilitating the development and execution of modular, auditable, and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. Making analysis modules would still require programming. The framework of each workflow includes a run manager to execute the encoded workflows locally (or on a cluster or cloud), parallelize tasks, and log all runtime events. The resulting workflows are highly modular and configurable by construction, facilitating flexible and extensible meta-applications that can be modified easily through configuration file editing. The workflows are fully encoded for ease of distribution and can be instantiated on external systems, a step toward reproducible research and comparative analyses. We introduce a framework for building Kronos components that function as shareable, modular nodes in Kronos workflows.ConclusionsThe Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large. Kronos is shipped with both Docker and Amazon Web Services Machine Images. It is free, open source, and available through the Python Package Index and at https://github.com/jtaghiyar/kronos

eScholarship - University of California

FAST: FAST Analysis of Sequences Toolbox.

Author: Claudia J. Canales
Dana L. Carper
David H. Ardell
David H. Ardell
Katherine C.H. Amrine
Katherine C.H. Amrine
Kyle T. Kauffman
Peter J. Becich
Raymond S. Lee
Travis J. Lawrence
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

eScholarship - University of California

Model Exploration Using OpenMOLE - a workflow engine for large scale distributed design of experiments and parameter tuning

Author: Leclaire Mathieu
Passerat-Palmbach Jonathan
Reuillon Romain
Publication venue
Publication date: 12/06/2015
Field of study

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. In this work, we briefly expose the strong assets of OpenMOLE and demonstrate its efficiency at exploring the parameter set of an agent simulation model. We perform a multi-objective optimisation on this model using computationally expensive Genetic Algorithms (GA). OpenMOLE hides the complexity of designing such an experiment thanks to its DSL, and transparently distributes the optimisation process. The example shows how an initialisation of the GA with a population of 200,000 individuals can be evaluated in one hour on the European Grid Infrastructure.Comment: IEEE High Performance Computing and Simulation conference 2015, Jun 2015, Amsterdam, Netherland

arXiv.org e-Print Archive

HAL-Paris1

Hal-Diderot

HAL-Polytechnique

A posteriori metadata from automated provenance tracking: Integration of AiiDA and TCOD

Author: Cepellotti Andrea
Gražulis Saulius
Marzari Nicola
Merkys Andrius
Mounet Nicolas
Pizzi Giovanni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/06/2017
Field of study

In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes

arXiv.org e-Print Archive

Directory of Open Access Journals