266 research outputs found
Reprodutibilidade e reuso de experimentos em eScience : workflows, ontologias e scripts
Orientadores: Claudia Maria Bauzer Medeiros, Yolanda GilTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Scripts e Sistemas Gerenciadores de Workflows CientÃficos (SGWfC) são abordagens comumente utilizadas para automatizar o fluxo de processos e análise de dados em experimentos cientÃficos computacionais. Apesar de amplamente usados em diversas disciplinas, scripts são difÃceis de entender, adaptar, reusar e reproduzir. Por esta razão, diversas soluções têm sido propostas para auxiliar na reprodutibilidade de experimentos que utilizam ambientes baseados em scripts. Porém, estas soluções não permitem a documentação completa do experimento, nem ajudam quando outros cientistas querem reusar apenas parte do código do script. SGWfCs, por outro lado, ajudam na documentação e reuso através do suporte aos cientistas durante a modelagem e execução dos seus experimentos, que são especificados e executados como componentes interconectados (reutilizáveis) de workflows. Enquanto workflows são melhores que scripts para entendimento e reuso dos experimentos, eles também exigem documentação adicional. Durante a modelagem de um experimento, cientistas frequentemente criam variantes de workflows, e.g., mudando componentes do workflow. Reuso e reprodutibilidade exigem o entendimento e rastreamento da proveniência das variantes, uma tarefa que consome muito tempo. Esta tese tem como objetivo auxiliar na reprodutibilidade e reuso de experimentos computacionais. Para superar estes desafios, nós lidamos com dois problemas de pesquisas: (1) entendimento de um experimento computacional, e (2) extensão de um experimento computacional. Nosso trabalho para resolver estes problemas nos direcionou na escolha de workflows e ontologias como respostas para ambos os problemas. As principais contribuições desta tese são: (i) apresentar os requisitos para a conversão de experimentos baseados em scripts em experimentos reprodutÃveis; (ii) propor uma metodologia que guia o cientista durante o processo de conversão de experimentos baseados em scripts em workflow research objects reprodutÃveis. (iii) projetar e implementar funcionalidades para avaliação da qualidade de experimentos computacionais; (iv) projetar e implementar o W2Share, um arcabouço para auxiliar a metodologia de conversão, que explora ferramentas e padrões que foram desenvolvidos pela comunidade cientÃfica para promover o reuso e reprodutibilidade; (v) projetar e implementar o OntoSoft-VFF, um arcabouço para captura de informação sobre software e componentes de workflow para auxiliar cientistas a gerenciarem a exploração e evolução de workflows. Nosso trabalho é apresentado via casos de uso em Dinâmica Molecular, Bioinformática e Previsão do TempoAbstract: Scripts and Scientific Workflow Management Systems (SWfMSs) are common approaches that have been used to automate the execution flow of processes and data analysis in scientific (computational) experiments. Although widely used in many disciplines, scripts are hard to understand, adapt, reuse, and reproduce. For this reason, several solutions have been proposed to aid experiment reproducibility for script-based environments. However, they neither allow to fully document the experiment nor do they help when third parties want to reuse just part of the code. SWfMSs, on the other hand, help documentation and reuse by supporting scientists in the design and execution of their experiments, which are specified and run as interconnected (reusable) workflow components (a.k.a. building blocks). While workflows are better than scripts for understandability and reuse, they still require additional documentation. During experiment design, scientists frequently create workflow variants, e.g., by changing workflow components. Reuse and reproducibility require understanding and tracking variant provenance, a time-consuming task. This thesis aims to support reproducibility and reuse of computational experiments. To meet these challenges, we address two research problems: (1) understanding a computational experiment, and (2) extending a computational experiment. Our work towards solving these problems led us to choose workflows and ontologies to answer both problems. The main contributions of this thesis are thus: (i) to present the requirements for the conversion of script to reproducible research; (ii) to propose a methodology that guides the scientists through the process of conversion of script-based experiments into reproducible workflow research objects; (iii) to design and implement features for quality assessment of computational experiments; (iv) to design and implement W2Share, a framework to support the conversion methodology, which exploits tools and standards that have been developed by the scientific community to promote reuse and reproducibility; (v) to design and implement OntoSoft-VFF, a framework for capturing information about software and workflow components to support scientists manage workflow exploration and evolution. Our work is showcased via use cases in Molecular Dynamics, Bioinformatics and Weather ForecastingDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2013/08293-7, 2014/23861-4, 2017/03570-3FAPES
Towards a system of concepts for Family Medicine. Multilingual indexing in General Practice/ Family Medicine in the era of Semantic Web
UNIVERSITY OF LIÈGE, BELGIUM
Executive Summary
Faculty of Medicine
Département Universitaire de Médecine Générale.
Unité de recherche Soins Primaires et Santé
Doctor in biomedical sciences
Towards a system of concepts for Family Medicine.
Multilingual indexing in General Practice/ Family Medicine in the era
of SemanticWeb
by Dr. Marc JAMOULLE
Introduction
This thesis is about giving visibility to the often overlooked work of family
physicians and consequently, is about grey literature in General Practice
and Family Medicine (GP/FM). It often seems that conference organizers
do not think of GP/FM as a knowledge-producing discipline that deserves
active dissemination. A conference is organized, but not much is done with
the knowledge shared at these meetings. In turn, the knowledge cannot be
reused or reapplied. This these is also about indexing. To find knowledge
back, indexing is mandatory. We must prepare tools that will automatically
index the thousands of abstracts that family doctors produce each year in
various languages. And finally this work is about semantics1. It is an introduction
to health terminologies, ontologies, semantic data, and linked
open data. All are expressions of the next step: Semantic Web for health
care data. Concepts, units of thought expressed by terms, will be our target
and must have the ability to be expressed in multiple languages. In turn,
three areas of knowledge are at stake in this study: (i) Family Medicine as a
pillar of primary health care, (ii) computational linguistics, and (iii) health
information systems.
Aim
• To identify knowledge produced by General practitioners (GPs) by
improving annotation of grey literature in Primary Health Care
• To propose an experimental indexing system, acting as draft for a
standardized table of content of GP/GM
• To improve the searchability of repositories for grey literature in GP/GM.
1For specific terms, see the Glossary page 257
x
Methods
The first step aimed to design the taxonomy by identifying relevant concepts
in a compiled corpus of GP/FM texts. We have studied the concepts
identified in nearly two thousand communications of GPs during
conferences. The relevant concepts belong to the fields that are focusing
on GP/FM activities (e.g. teaching, ethics, management or environmental
hazard issues).
The second step was the development of an on-line, multilingual, terminological
resource for each category of the resulting taxonomy, named
Q-Codes. We have designed this terminology in the form of a lightweight
ontology, accessible on-line for readers and ready for use by computers of
the semantic web. It is also fit for the Linked Open Data universe.
Results
We propose 182 Q-Codes in an on-line multilingual database (10 languages)
(www.hetop.eu/Q) acting each as a filter for Medline. Q-Codes are also available
under the form of Unique Resource Identifiers (URIs) and are exportable
in Web Ontology Language (OWL). The International Classification of Primary
Care (ICPC) is linked to Q-Codes in order to form the Core Content
Classification in General Practice/Family Medicine (3CGP). So far, 3CGP is
in use by humans in pedagogy, in bibliographic studies, in indexing congresses,
master theses and other forms of grey literature in GP/FM. Use by
computers is experimented in automatic classifiers, annotators and natural
language processing.
Discussion
To the best of our knowledge, this is the first attempt to expand the ICPC
coding system with an extension for family physician contextual issues,
thus covering non-clinical content of practice. It remains to be proven that
our proposed terminology will help in dealing with more complex systems,
such as MeSH, to support information storage and retrieval activities.
However, this exercise is proposed as a first step in the creation of an ontology
of GP/FM and as an opening to the complex world of Semantic Web
technologies.
Conclusion
We expect that the creation of this terminological resource for indexing abstracts
and for facilitating Medline searches for general practitioners, researchers
and students in medicine will reduce loss of knowledge in the
domain of GP/FM. In addition, through better indexing of the grey literature
(congress abstracts, master’s and doctoral theses), we hope to enhance
the accessibility of research results and give visibility to the invisible work
of family physicians
- …