1,223 research outputs found
Polyflow: a Polystore-compliant mechanism to provide interoperability to heterogeneous provenance graphs
Many scientific experiments are modeled as workflows. Workflows usually output massive amounts of data. To guarantee the reproducibility of workflows, they are usually orchestrated by Workflow Management Systems (WfMS), that capture provenance data. Provenance represents the lineage of a data fragment throughout its transformations by activities in a workflow. Provenance traces are usually represented as graphs. These
graphs allows scientists to analyze and evaluate results produced by a workflow. However, each WfMS has a proprietary format for provenance and do it in different granularity levels. Therefore, in more complex scenarios in which the scientist needs to interpret provenance graphs generated by multiple WfMSs and workflows, a challenge arises. To first understand the research landscape, we conduct a Systematic Literature Mapping,
assessing existing solutions under several different lenses. With a clearer understanding of the state of the art, we propose a tool called Polyflow, which is based on the concept of Polystore systems, integrating several databases of heterogeneous origin by adopting a global ProvONE schema. Polyflow allows scientists to query multiple provenance graphs in an integrated way. Polyflow was evaluated by experts using provenance data collected from real experiments that generate phylogenetic trees through workflows. The experiment results suggest that Polyflow is a viable solution for interoperating heterogeneous provenance data generated by different WfMSs, from both a usability and performance standpoint.Muitos experimentos científicos são modelados como workflows (fluxos de trabalho). Workflows produzem comumente um grande volume de dados. De forma a garantir a reprodutibilidade desses workflows, estes geralmente são orquestrados por Sistemas de Gerência de Workflows (SGWfs), garantindo que dados de proveniência sejam capturados. Dados de proveniência representam o histórico de derivação de um dado ao longo da execução do workflow. Assim, o histórico de derivação dos dados pode ser representado
por meio de um grafo de proveniência. Este grafo possibilita aos cientistas analisarem e avaliarem resultados produzidos por um workflow. Todavia, cada SGWf tem seu formato proprietário de representação para dados de proveniência, e os armazenam em diferentes granularidades. Consequentemente, em cenários mais complexos em que um cientista precisa analisar de forma integrada grafos de proveniência gerados por múltiplos workflows, isso se torna desafiador. Primeiramente, para entender o campo de pesquisa, realizamos um Mapeamento Sistemático da Literatura, avaliando soluções existentes sob diferentes lentes. Com uma compreensão mais clara do atual estado da arte, propomos uma ferramenta chamada Polyflow, inspirada em conceitos de sistemas Polystore, possibilitando a integração de várias bases de dados heterogêneas por meio de uma interface de consulta única que utiliza o ProvONE como schema global. Polyflow permite que cientistas
submetam consultas em múltiplos grafos de proveniência de maneira integrada. Polyflow foi avaliado em conjunto com especialistas usando dados de proveniência coletados de workflows reais que apoiam o estudo de geração de árvores filogenéticas. O resultado da avaliação mostrou a viabilidade do Polyflow para interoperar semanticamente dados de proveniência gerado por distintos SGWfs, tanto do ponto de vista de desempenho quanto de usabilidade
Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline.
Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges--management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu
The Semantic Web Revisited
The original Scientific American article on the Semantic Web appeared in 2001. It described the evolution of a Web that consisted largely of documents for humans to read to one that included data and information for computers to manipulate. The Semantic Web is a Web of actionable information--information derived from data through a semantic theory for interpreting the symbols.This simple idea, however, remains largely unrealized. Shopbots and auction bots abound on the Web, but these are essentially handcrafted for particular tasks; they have little ability to interact with heterogeneous data and information types. Because we haven't yet delivered large-scale, agent-based mediation, some commentators argue that the Semantic Web has failed to deliver. We argue that agents can only flourish when standards are well established and that the Web standards for expressing shared meaning have progressed steadily over the past five years. Furthermore, we see the use of ontologies in the e-science community presaging ultimate success for the Semantic Web--just as the use of HTTP within the CERN particle physics community led to the revolutionary success of the original Web. This article is part of a special issue on the Future of AI
User model interoperability in education: sharing learner data using the experience API and distributed ledger technology
Learning analytics and data mining require gathering and exchanging learner data for further processing and designing of activities tailored to learner’s characteristics, context, and needs. Currently, systems that store learners’ attributes should, ideally, be operated and controlled by responsible and trustworthy authorities that guarantee the protection and sovereignty of data and use objective criteria to protect and represent all parties’ interests. This chapter introduces a peer-to-peer method for storing and exchanging learner data with minimal trust. The proposed approach, underpinned by the Experience API standard, eliminates the need of a mediator authority by using distributed ledger technology
Towards structured sharing of raw and derived neuroimaging data across existing resources
Data sharing efforts increasingly contribute to the acceleration of
scientific discovery. Neuroimaging data is accumulating in distributed
domain-specific databases and there is currently no integrated access mechanism
nor an accepted format for the critically important meta-data that is necessary
for making use of the combined, available neuroimaging data. In this
manuscript, we present work from the Derived Data Working Group, an open-access
group sponsored by the Biomedical Informatics Research Network (BIRN) and the
International Neuroimaging Coordinating Facility (INCF) focused on practical
tools for distributed access to neuroimaging data. The working group develops
models and tools facilitating the structured interchange of neuroimaging
meta-data and is making progress towards a unified set of tools for such data
and meta-data exchange. We report on the key components required for integrated
access to raw and derived neuroimaging data as well as associated meta-data and
provenance across neuroimaging resources. The components include (1) a
structured terminology that provides semantic context to data, (2) a formal
data model for neuroimaging with robust tracking of data provenance, (3) a web
service-based application programming interface (API) that provides a
consistent mechanism to access and query the data model, and (4) a provenance
library that can be used for the extraction of provenance data by image
analysts and imaging software developers. We believe that the framework and set
of tools outlined in this manuscript have great potential for solving many of
the issues the neuroimaging community faces when sharing raw and derived
neuroimaging data across the various existing database systems for the purpose
of accelerating scientific discovery
Toward cyberinfrastructure to facilitate collaboration and reproducibility for marine integrated ecosystem assessments
Author Posting. © The Author(s), 2016. This is the author's version of the work. It is posted here under a nonexclusive, irrevocable, paid-up, worldwide license granted to WHOI. It is made available for personal use, not for redistribution. The definitive version was published in Earth Science Informatics 10 (2017): 85-97, doi:10.1007/s12145-016-0280-4.There is a growing need for cyberinfrastructure to support science-based decision making in management of natural resources. In particular, our motivation was to aid the development of cyberinfrastructure for Integrated Ecosystem Assessments (IEAs) for marine ecosystems. The IEA process involves analysis of natural and socio-economic information based on diverse and disparate sources of data, requiring collaboration among scientists of many disciplines and communication with other stakeholders. Here we describe our bottom-up approach to developing cyberinfrastructure through a collaborative process engaging a small group of domain and computer scientists and software engineers. We report on a use case evaluated for an Ecosystem Status Report, a multi-disciplinary report inclusive of Earth, life, and social sciences, for the Northeast U.S. Continental Shelf Large Marine Ecosystem. Ultimately, we focused on sharing workflows as a component of the cyberinfrastructure to facilitate collaboration and reproducibility. We developed and deployed a software environment to generate a portion of the Report, retaining traceability of derived datasets including indicators of climate forcing, physical pressures, and ecosystem states. Our solution for sharing workflows and delivering reproducible documents includes IPython (now Jupyter) Notebooks. We describe technical and social challenges that we encountered in the use case and the importance of training to aid the adoption of best practices and new technologies by domain scientists. We consider the larger challenges for developing end-to-end cyberinfrastructure that engages other participants and stakeholders in the IEA process.Support for this research was provided by the U. S. National Science Foundation #0955649 with additional support to SB by the Investment in Science Fund at Woods Hole Oceanographic Institution
A Semantic Grid Oriented to E-Tourism
With increasing complexity of tourism business models and tasks, there is a
clear need of the next generation e-Tourism infrastructure to support flexible
automation, integration, computation, storage, and collaboration. Currently
several enabling technologies such as semantic Web, Web service, agent and grid
computing have been applied in the different e-Tourism applications, however
there is no a unified framework to be able to integrate all of them. So this
paper presents a promising e-Tourism framework based on emerging semantic grid,
in which a number of key design issues are discussed including architecture,
ontologies structure, semantic reconciliation, service and resource discovery,
role based authorization and intelligent agent. The paper finally provides the
implementation of the framework.Comment: 12 PAGES, 7 Figure
Smart Environmental Data Infrastructures: Bridging the Gap between Earth Sciences and Citizens
The monitoring and forecasting of environmental conditions is a task to which much effort and resources are devoted by the scientific community and relevant authorities. Representative examples arise in meteorology, oceanography, and environmental engineering. As a consequence, high volumes of data are generated, which include data generated by earth observation systems and different kinds of models. Specific data models, formats, vocabularies and data access infrastructures have been developed and are currently being used by the scientific community. Due to this, discovering, accessing and analyzing environmental datasets requires very specific skills, which is an important barrier for their reuse in many other application domains. This paper reviews earth science data representation and access standards and technologies, and identifies the main challenges to overcome in order to enable their integration in semantic open data infrastructures. This would allow non-scientific information technology practitioners to devise new end-user solutions for citizen problems in new application domainsThis research was co-funded by (i) the TRAFAIR project (2017-EU-IA-0167), co-financed by the Connecting Europe Facility of the European Union, (ii) the RADAR-ON-RAIA project (0461_RADAR_ON_RAIA_1_E) co-financed by the European Regional Development Fund (ERDF) through the Iterreg V-A Spain-Portugal program (POCTEP) 2014-2020, and (iii) the Consellería de Educación, Universidade e Formación Profesional of the regional government of Galicia (Spain), through the support for research groups with growth potential (ED431B 2018/28)S
- …