56 research outputs found
Enabling FAIR Research in Earth Science through Research Objects
Data-intensive science communities are progressively adopting FAIR practices
that enhance the visibility of scientific breakthroughs and enable reuse. At
the core of this movement, research objects contain and describe scientific
information and resources in a way compliant with the FAIR principles and
sustain the development of key infrastructure and tools. This paper provides an
account of the challenges, experiences and solutions involved in the adoption
of FAIR around research objects over several Earth Science disciplines. During
this journey, our work has been comprehensive, with outcomes including: an
extended research object model adapted to the needs of earth scientists; the
provisioning of digital object identifiers (DOI) to enable persistent
identification and to give due credit to authors; the generation of
content-based, semantically rich, research object metadata through natural
language processing, enhancing visibility and reuse through recommendation
systems and third-party search engines; and various types of checklists that
provide a compact representation of research object quality as a key enabler of
scientific reuse. All these results have been integrated in ROHub, a platform
that provides research object management functionality to a wealth of
applications and interfaces across different scientific communities. To monitor
and quantify the community uptake of research objects, we have defined
indicators and obtained measures via ROHub that are also discussed herein
Enabling FAIR research in Earth Science through research objects
Data-intensive science communities are progressively adopting FAIR practices that enhance the
visibility of scientific breakthroughs and enable reuse. At the core of this movement, research objects
contain and describe scientific information and resources in a way compliant with the FAIR principles
and sustain the development of key infrastructure and tools. This paper provides an account of
the challenges, experiences and solutions involved in the adoption of FAIR around research objects
over several Earth Science disciplines. During this journey, our work has been comprehensive, with
outcomes including: an extended research object model adapted to the needs of earth scientists;
the provisioning of digital object identifiers (DOI) to enable persistent identification and to give due
credit to authors; the generation of content-based, semantically rich, research object metadata through natural language processing, enhancing visibility and reuse through recommendation systems and
third-party search engines; and various types of checklists that provide a compact representation of
research object quality as a key enabler of scientific reuse. All these results have been integrated in
ROHub, a platform that provides research object management functionality to a wealth of applications
and interfaces across different scientific communities. To monitor and quantify the community uptake
of research objects, we have defined indicators and obtained measures via ROHub that are also
discussed herein.Published550-5645IT. Osservazioni satellitariJCR Journa
Workflows and extensions to the Kepler scientific workflow system to support environmental sensor data access and analysis
Scientific Workflow Tools
Although an increasing amount of cyberinfrastructure technologies have emerged in the last few years to achieve remote data access, distributed job execution, and data management, orchestrating these components with minimal overhead still remains a difficult task for scientists. Scientific workflow systems improve this situation by creating interfaces to a variety of technologies and automating the execution and monitoring of the workflows.
A scientific workflow is the process of combining data and processes into a structured set of steps that implement semi-automated computational solutions of a scientific problem. Kepler is a cross-project collaboration, with a purpose to develop a domain-independent scientific workflow system. It provides an environment in which scientists can design and execute scientific workflows by specifying the desired sequence of computational actions and the appropriate dataflow. Currently deployed workflows range
from local analytical pipelines to distributed, high-performance applications that can run in cluster, grid, or cloud computing environments.
The scientific workflow approach offers a number of advantages over traditional scripting-based approaches, including simplified configuration; improved reusability, maintenance and sharing; automated provenance management to capture and browse the lineage of data products; and support for fault-tolerance.
This talk presents an overview of common scientific workflow requirements and illustrates these features using the Kepler scientific workflow system. We highlight the features of Kepler in several scientific applications, as well as describe upcoming extensions and improvements
A Framework for Distributed Data-Parallel Execution in the Kepler Scientific Workflow System
AbstractDistributed Data-Parallel (DDP) patterns such as MapReduce have become increasingly popular as solutions to facilitate data-intensive applications, resulting in a number of systems supporting DDP workflows. Yet, applications or workflows built using these patterns are usually tightly-coupled with the underlying DDP execution engine they select. We present a framework for distributed data-parallel execution in the Kepler scientific workflow system that enables users to easily switch between different DDP execution engines. We describe a set of DDP actors based on DDP patterns and directors for DDP workflow executions within the presented framework. We demonstrate how DDP workflows can be easily composed in the Kepler graphic user interface through the reuse of these DDP actors and directors and how the generated DDP workflows can be executed in different distributed environments. Via a bioinformatics usecase, we discuss the usability of the proposed framework and validate its execution scalability
Kepler WebView: A Lightweight, Portable Framework for Constructing Real-time Web Interfaces of Scientific Workflows
AbstractModern web technologies facilitate the creation of high-quality data visualizations, and rich, interactive components across a wide variety of devices. Scientific workflow systems can greatly benefit from these technologies by giving scientists a better understanding of their data or model leading to new insights. While several projects have enabled web access to scientific workflow systems, they are primarily organized as a large portal server encapsulating the workflow engine. In this vision paper, we propose the design for Kepler WebView, a lightweight framework that integrates web technologies with the Kepler Scientific Workflow System. By embedding a web server in the Kepler process, Kepler WebView enables a wide variety of usage scenarios that would be difficult or impossible using the portal model
Provenance for MapReduce-based data-intensive workflows
C '11: International Conference for High Performance Computing, Networking, Storage and Analysis Seattle Washington USA 14 November 2011MapReduce has been widely adopted by many business and scientific applications for data-intensive processing of large datasets. There are increasing efforts for workflows and systems to work with the MapReduce programming model and the Hadoop environment including our work on a higher-level programming model for MapReduce within the Kepler Scientific Workflow System. However, to date, provenance of MapReduce-based workflows and its effects on workflow execution performance have not been studied in depth. In this paper, we present an extension to our earlier work on MapReduce in Kepler to record the provenance of MapReduce workflows created using the Kepler+Hadoop framework. In particular, we present: (i) a data model that is able to capture provenance inside a MapReduce job as well as the provenance for the workflow that submitted it; (ii) an extension to the Kepler+Hadoop architecture to record provenance using this data model on MySQL Cluster; (iii) a programming interface to query the collected information; and (iv) an evaluation of the scalability of collecting and querying this provenance information using two scenarios with different characteristics.The authors would like to thank the rest of the Kepler team for their collaboration. This work was supported by NSF SDCI Award OCI-0722079 for Kepler/CORE and ABI Award DBI-1062565 for bioKepler, DOE SciDAC Award DE-FC02-07ER25811 for SDM Center, the UCGRID Project, and an SDSC Triton Research Opportunities grant.https://dl.acm.org/doi/10.1145/2110497.211050
- …
