251 research outputs found
Managing FAIR Tribological Data Using Kadi4Mat
The ever-increasing amount of data generated from experiments and simulations in engineering sciences is relying more and more on data science applications to generate new knowledge. Comprehensive metadata descriptions and a suitable research data infrastructure are essential prerequisites for these tasks. Experimental tribology, in particular, presents some unique challenges in this regard due to the interdisciplinary nature of the field and the lack of existing standards. In this work, we demonstrate the versatility of the open source research data infrastructure Kadi4Mat by managing and producing FAIR tribological data. As a showcase example, a tribological experiment is conducted by an experimental group with a focus on comprehensiveness. The result is a FAIR data package containing all produced data as well as machine- and user-readable metadata. The close collaboration between tribologists and software developers shows a practical bottom-up approach and how such infrastructures are an essential part of our FAIR digital future
Composição de serviços para aplicações biomédicas
Doutoramento em Engenharia InformáticaA exigente inovação na área das aplicações biomédicas tem guiado a evolução
das tecnologias de informação nas últimas décadas. Os desafios associados a
uma gestão, integração, análise e interpretação eficientes dos dados
provenientes das mais modernas tecnologias de hardware e software
requerem um esforço concertado. Desde hardware para sequenciação de
genes a registos electrónicos de paciente, passando por pesquisa de
fármacos, a possibilidade de explorar com precisão os dados destes
ambientes é vital para a compreensão da saúde humana. Esta tese engloba a
discussão e o desenvolvimento de melhores estratégias informáticas para
ultrapassar estes desafios, principalmente no contexto da composição de
serviços, incluindo técnicas flexíveis de integração de dados, como
warehousing ou federação, e técnicas avançadas de interoperabilidade, como
serviços web ou LinkedData.
A composição de serviços é apresentada como um ideal genérico, direcionado
para a integração de dados e para a interoperabilidade de software.
Relativamente a esta última, esta investigação debruçou-se sobre o campo da
farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições
para este projeto, um novo standard de interoperabilidade e um motor de
execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma
plataforma para realizar estudos avançados de farmacovigilância. No contexto
do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os
desafios associados à integração de dados distribuídos e heterogéneos no
campo do varíoma humano. Foi criada uma nova solução, WAVe - Web
Analyses of the Variome, que fornece uma coleção rica de dados de variação
genética através de uma interface Web inovadora e de uma API avançada. O
desenvolvimento destas estratégias evidenciou duas oportunidades claras na
área de software biomédico: melhorar o processo de implementação de
software através do recurso a técnicas de desenvolvimento rápidas e
aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do
paradigma de web semântica.
A plataforma COEUS atravessa as fronteiras de integração e
interoperabilidade, fornecendo metodologias para a aquisição e tradução
flexíveis de dados, bem como uma camada de serviços interoperáveis para
explorar semanticamente os dados agregados. Combinando as técnicas de
desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a
box", a plataforma COEUS é uma aproximação pioneira, permitindo o
desenvolvimento da próxima geração de aplicações biomédicas.The demand for innovation in the biomedical software domain has been an
information technologies evolution driver over the last decades. The challenges
associated with the effective management, integration, analyses and
interpretation of the wealth of life sciences information stemming from modern
hardware and software technologies require concerted efforts. From gene
sequencing hardware to pharmacology research up to patient electronic health
records, the ability to accurately explore data from these environments is vital
to further improve our understanding of human health. This thesis encloses the
discussion on building better informatics strategies to address these
challenges, primarily in the context of service composition, including
warehousing and federation strategies for resource integration, as well as web
services or LinkedData for software interoperability.
Service composition is introduced as a general principle, geared towards data
integration and software interoperability. Concerning the latter, this research
covers the service composition requirements within the pharmacovigilance
field, namely on the European EU-ADR project. The contributions to this area,
the definition of a new interoperability standard and the creation of a new
workflow-wrapping engine, are behind the successful construction of the EUADR
Web Platform, a workspace for delivering advanced pharmacovigilance
studies. In the context of the European GEN2PHEN project, this research
tackles the challenges associated with the integration of heterogeneous and
distributed data in the human variome field. For this matter, a new lightweight
solution was created: WAVe, Web Analysis of the Variome, provides a rich
collection of genetic variation data through an innovative portal and an
advanced API. The development of the strategies underlying these products
highlighted clear opportunities in the biomedical software field: enhancing the
software implementation process with rapid application development
approaches and improving the quality and availability of data with the adoption
of the Semantic Web paradigm.
COEUS crosses the boundaries of integration and interoperability as it provides
a framework for the flexible acquisition and translation of data into a semantic
knowledge base, as well as a comprehensive set of interoperability services,
from REST to LinkedData, to fully exploit gathered data semantically. By
combining the lightness of rapid application development strategies with the
richness of its "Semantic Web in a box" approach, COEUS is a pioneering
framework to enhance the development of the next generation of biomedical
applications
Towards Reproducible and Privacy-preserving Analyses Across Federated Repositories for Omics data
Even when duly anonymized, health research data has the potential to be disclosive and there- fore requires special safeguards according to the European General Data Protection Regulation (GDPR). Furthermore, the incorporation of FAIR principles (Findable, Accessible, Interoperable, Reusable) for a more favorable reuse of existing data, calls for an approach where sensitive data is kept locally and only metadata and aggregated results are shared. Additionally, since central pool- ing is discouraged by ethical, legal, and societal issues, it is more frequent to observe maturing data management frameworks, and platforms adopting the federated approach.
Current implementations of privacy-preserving analysis frameworks seem to be limited when data becomes very large (millions of rows, hundreds of variables). Biological samples data, col- lected by high-throughput technologies, such as Next Generation Sequencing (NGS), which allows to sequence entire genomes, are examples of this kind of data.
The term "genomics" refers to the field of science that studies genomes. The Omics tech- nologies intend to produce a systematic identification of all mRNA (transcriptomics), proteins (proteomics), and metabolites (metabolomics), respectively, present in a given biological sample.
In the particular case of Omics data, these data are produced by computational workflows known as bioinformatics pipelines. The reproducibility of these pipelines is hard and it is often underestimated. Nevertheless, it is important to generate trust in scientific results, and therefore, is fundamental to know how these Omics data were generated or obtained.
This work will leverage on the promising results of current open-source implementations for distributed privacy-preserving analyses, while aiming at generalizing the approach and addressing some of their shortcomings.
To enable the privacy-preserving analysis of Omics data, we introduced the "resource" con- cept, implemented in one of the studied solutions.
The results were promising, seeing that the privacy-preserving analysis was effective when us- ing the DataSHIELD framework in conjunction with the "resource R" package. We also concluded that the adoption of specialized DataSHIELD packages for Omics analyses is a viable pathway to leverage the privacy-preserving for Omics data.
To address the reproducibility challenges, we defined a database model to represent the steps, commands and operations executed by the bioinformatics pipelines.
The database model is promising, but to accomplish all reproducibility requirements, including container support and integration with code sharing platforms, it is necessary to use other tools, such as Nextflow or Snakemake, with dozens of other tested and mature functions.Even when duly anonymized, health research data has the potential to be disclosive and there- fore requires special safeguards according to the European General Data Protection Regulation (GDPR). Furthermore, the incorporation of FAIR principles (Findable, Accessible, Interoperable, Reusable) for a more favorable reuse of existing data, calls for an approach where sensitive data is kept locally and only metadata and aggregated results are shared. Additionally, since central pool- ing is discouraged by ethical, legal, and societal issues, it is more frequent to observe maturing data management frameworks, and platforms adopting the federated approach.
Current implementations of privacy-preserving analysis frameworks seem to be limited when data becomes very large (millions of rows, hundreds of variables). Biological samples data, col- lected by high-throughput technologies, such as Next Generation Sequencing (NGS), which allows to sequence entire genomes, are examples of this kind of data.
The term "genomics" refers to the field of science that studies genomes. The Omics tech- nologies intend to produce a systematic identification of all mRNA (transcriptomics), proteins (proteomics), and metabolites (metabolomics), respectively, present in a given biological sample.
In the particular case of Omics data, these data are produced by computational workflows known as bioinformatics pipelines. The reproducibility of these pipelines is hard and it is often underestimated. Nevertheless, it is important to generate trust in scientific results, and therefore, is fundamental to know how these Omics data were generated or obtained.
This work will leverage on the promising results of current open-source implementations for distributed privacy-preserving analyses, while aiming at generalizing the approach and addressing some of their shortcomings.
To enable the privacy-preserving analysis of Omics data, we introduced the "resource" con- cept, implemented in one of the studied solutions.
The results were promising, seeing that the privacy-preserving analysis was effective when us- ing the DataSHIELD framework in conjunction with the "resource R" package. We also concluded that the adoption of specialized DataSHIELD packages for Omics analyses is a viable pathway to leverage the privacy-preserving for Omics data.
To address the reproducibility challenges, we defined a database model to represent the steps, commands and operations executed by the bioinformatics pipelines.
The database model is promising, but to accomplish all reproducibility requirements, including container support and integration with code sharing platforms, it is necessary to use other tools, such as Nextflow or Snakemake, with dozens of other tested and mature functions
Linked Data based Health Information Representation, Visualization and Retrieval System on the Semantic Web
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.To better facilitate health information dissemination, using flexible ways to
represent, query and visualize health data becomes increasingly important.
Semantic Web technologies, which provide a common framework by
allowing data to be shared and reused between applications, can be applied
to the management of health data. Linked open data - a new semantic web
standard to publish and link heterogonous data- allows not only human,
but also machine to brows data in unlimited way.
Through a use case of world health organization HIV data of sub Saharan
Africa - which is severely affected by HIV epidemic, this thesis built a
linked data based health information representation, querying and
visualization system. All the data was represented with RDF, by
interlinking it with other related datasets, which are already on the cloud.
Over all, the system have more than 21,000 triples with a SPARQL
endpoint; where users can download and use the data and – a SPARQL
query interface where users can put different type of query and retrieve the
result. Additionally, It has also a visualization interface where users can
visualize the SPARQL result with a tool of their preference. For users who
are not familiar with SPARQL queries, they can use the linked data search
engine interface to search and browse the data.
From this system we can depict that current linked open data technologies
have a big potential to represent heterogonous health data in a flexible and
reusable manner and they can serve in intelligent queries, which can
support decision-making. However, in order to get the best from these
technologies, improvements are needed both at the level of triple stores
performance and domain-specific ontological vocabularies
Framework de planeamento de missões para frotas de drones interligados
The usage of aerial drones has become more popular as they also become
more accessible, both in economic and usability terms. Nowadays, these
vehicles can present reduced dimensions and a good cost-benefit ratio, which
makes it possible for several services and applications supported by aerial
drone networks to emerge. Some scenarios that benefit from the use of aerial
drones are the monitoring of emergency situations and natural disasters, the
patrolling of urban areas and support to police forces, and tourist applications
such as the real-time video transmission of points of interest. It is common
for the control of the drone to be dependent on human intervention in these
situations, which requires professionals specialized in its control. However,
in recent years, several solutions have emerged that enable the autonomous
flight of these vehicles, minimizing manual interference.
Taking into account the enormous diversity of use cases, many of the
existing solutions for autonomous control focus on specific scenarios. Generic
mission planning platforms also exist, but most of them only allow missions
consisting of linear waypoints to be traversed. These situations translate into
a mission support that is not very flexible.
In this dissertation, we propose a modular infrastructure that can be
used in various scenarios, enabling the autonomous control and monitoring
of a fleet of aerial drones in a mission context. This platform has two main
components, one integrated into the onboard computer of the vehicle, and the
other one in the ground control. The former allows the communication with
the flight controller so that it can collect telemetry data and send movement
instructions to the drone. The latter allows to monitor this data and send
the commands remotely, also enabling robust mission planning with multiple
drones. A mission can be described in a script that the ground module
interprets, sending the commands to the assigned vehicles. These missions
can describe different paths, modifying the behaviour of the drones according
to external factors, such as a sensor reading. It is also possible to define
plugins to be reused in various missions, for example, by integrating an
algorithm that ensures that all drones maintain connectivity.
The solution was evaluated in scenarios with a single drone and with
the collaboration of multiple drones. The tests were performed in a simulated
environment and also in an environment with real drones. The observed
behaviour is similar in both scenarios.A utilização de drones aéreos tem-se vindo a popularizar à medida que estes
se tornam mais acessíveis, quer em termos económicos quer em usabilidade.
Atualmente, estes veículos são capazes de apresentar dimensões reduzidas
e uma boa relação de custo-benefício, o que potencia que diversos serviços
e aplicações suportados por redes de drones aéreos estejam a emergir.
Alguns cenários que beneficiam da utilização de drones aéreos são a
monitorização de situações de emergência e catástrofes naturais, a patrulha
de áreas urbanas e apoio às forças policiais e aplicações turísticas como
a transmissão de vídeo em tempo real de pontos de interesse. É comum
que o controlo do drone esteja dependente de intervenção humana nestas
situações, o que requer profissionais especializados no seu controlo. No
entanto, nos últimos anos têm surgido diversas soluções que possibilitam o
vôo autónomo destes veículos, minimizando a interferência manual.
Perante a enorme diversidade de casos de aplicação, muitas das soluções
existentes para o controlo autónomo focam-se em cenários específicos
de intervenção. Existem também plataformas de planeamento genérico de
missões, mas que na sua maioria apenas permitem missões constituídas por
conjuntos lineares de pontos a ser percorridos. Estas situações traduzem-se
num suporte a missões que é pouco flexível.
Nesta dissertação propomos uma infraestrutura modular passível de
ser utilizada em cenários variados, possibilitando o controlo autónomo de
uma frota de drones aéreos num contexto de missão e a sua monitorização.
Esta plataforma tem dois componentes principais, um integrado no
computador a bordo do veículo e o outro no controlo terrestre. O primeiro
permite a comunicação com o controlador de vôo para que se possa recolher
diversos dados de telemetria e enviar instruções de movimento para o drone.
O segundo permite monitorizar esses dados e enviar os comandos remotamente,
possibilitando também um planeamento robusto de missões com
múltiplos drones. Uma missão pode ser descrita num script que o módulo
terrestre interpreta, enviando os comandos para os veículos atribuídos. Estas
missões podem descrever diversos caminhos, modificando o comportamento
dos drones de acordo com factores externos, como a leitura de um sensor.
Também é possível definir plugins para serem reutilizados em várias missões,
como por exemplo, integrando um algoritmo que garante que todos os drones
mantêm a conectividade.
A solução foi avaliada em cenários com um único drone e com a colaboração
de múltiplos drones. Os testes foram executados em ambiente
simulado e também num ambiente com drones reais. O comportamento
observado nas missões é semelhante em ambos os cenários.Mestrado em Engenharia de Computadores e Telemátic
Analysis Of Aircraft Arrival Delay And Airport On-time Performance
While existing grid environments cater to specific needs of a particular user community, we need to go beyond them and consider general-purpose large-scale distributed systems consisting of large collections of heterogeneous computers and communication systems shared by a large user population with very diverse requirements. Coordination, matchmaking, and resource allocation are among the essential functions of large-scale distributed systems. Although deterministic approaches for coordination, matchmaking, and resource allocation have been well studied, they are not suitable for large-scale distributed systems due to the large-scale, the autonomy, and the dynamics of the systems. We have to seek for nondeterministic solutions for large-scale distributed systems. In this dissertation we describe our work on a coordination service, a matchmaking service, and a macro-economic resource allocation model for large-scale distributed systems. The coordination service coordinates the execution of complex tasks in a dynamic environment, the matchmaking service supports finding the appropriate resources for users, and the macro-economic resource allocation model allows a broker to mediate resource providers who want to maximize their revenues and resource consumers who want to get the best resources at the lowest possible price, with some global objectives, e.g., to maximize the resource utilization of the system
Distributed Management of Grid-based Scientific Workflows
Grids and service-oriented technologies are emerging as dominant approaches for distributed systems. With the evolution of these technologies, scientific workflows have been introduced as a tool for scientists to assemble highly specialized applications, and to exchange large heterogeneous datasets in order to automate and accelerate the accomplishment of complex scientific tasks. Several Scientific Workflow Management Systems (SWfMS) have already been designed to support the specification, execution, and monitoring of scientific workflows. Meanwhile, they still face key challenges from two different perspectives: system usability and system efficiency. From the system usability perspective, current SWfMS are not designed to be simple enough for scientists who have quite limited IT knowledge. What’s more, there is no easy mechanism by which scientists can share and re-use scientific experiments that have already been designed and proved by others. From the perspective of system efficiency, existing SWfMS are coordinating and executing workflows in a centralized fashion using a single scheduler and / or a workflow enactor. This creates a single point of failure, forms a scalability bottleneck, and enforces centralized fault handling. In addition, they don’t consider load balancing while mapping abstract jobs onto several computational nodes. Another important challenge exists due to the common nature of scientific workflow applications, that need to exchange a huge amount of data during the execution process. Some available SWfMS use a mediator-based approach for data transfer where data must be transferred first to a centralized data manager, which is completely inefficient. Other SWfMS apply a peer-to-peer approach via data references. Even this approach is not sufficient for scientific workflows as a single complex scientific activity can produce an extensive amount of data. In this thesis, we introduce SWIMS (Scientific Workflow Integration and Management System) framework. It employs the Web Services technology to originate a distributed management system for data-intensive scientific workflows. The purpose of SWIMS is to overcome the previously mentioned challenges through a set of salient features: i) Support for distributed execution and management of workflows, ii) diminution of communication traffic, iii) support for smart re-run, iv) distributed fault handling and load balancing, v) ease of use, and vi) extensive sharing of scientific workflows. We discuss the motivation, design, and implementation of the SWIMS framework. Then, we evaluate it through the Montage application from the astronomy domain
Remote service provision for connected homes.
This research study proposed to view a remote service delivery system from three distinct perspectives: connected home environments (user perspective), remote service delivery platform (service enabler), and remote service providers (service provider perspective); to establish a holistic view on the requirements of remote service provision to connected home environments. A reference architecture for remote service provision based on the proposed views has been devised, which provides built-in support for an “On-Demand” operating model and facilitate “Freedom of Choice” via different levels of interoperability
Integração contínua no 5GASP
The wide adoption of an NFV-oriented paradigm by network operators proves
the importance of NFV in the future of communication networks. This paradigm
allows network operators to speed up the development process of their services,
decoupling hardware from the functionalities provided by these services. However,
since NFV has only been recently globally adopted, several questions and difficulties
arose. Network operators need to ensure the reliability and the correct behavior
of their Virtualized Network Functions, which poses severe challenges. Thus, the
need for developing new validation tools, which are capable of validating network
functions that live in an NFV ecosystem. 5GASP is a European project which
aims to shorten the idea-to-market process by creating a fully automated and selfservice
5G testbed and providing support tools for Continuous Integration in a
secure and trusted environment, addressing the DevOps paradigm. Being aligned
with 5GASP’s goals, this dissertation mainly addresses the development of tools
to validate NetApps. To accomplish this, this document introduces two different
mechanisms for validating NetApps. The first tool is responsible for statically
validate the NetApps before they are deployed in 5GASP’s testbeds, being denominated
by NetApp Package Validator. Regarding this tool, during this document
the focus is its Descriptors Validator Module, which validates the NetApp descriptors
through syntactic, semantics, and reference validation and supports NetApps
developed according to different Information Models. The second tool comprises
an automated validation pipeline. This pipeline validates the functionality and the
behavior of the NetApps once they are deployed in a 5G-testbed. Besides, it collects
several metrics to enable a better understanding of the NetApp’s behavior.
Both tools are expected to be integrated with the 5GASP’s ecosystem. This document
presents the requirements definition, architecture, and implementation of
these tools and presents their results and outputs.The wide adoption of an NFV-oriented paradigm by network operators proves
the importance of NFV in the future of communication networks. This paradigm
allows network operators to speed up the development process of their services,
decoupling hardware from the functionalities provided by these services. However,
since NFV has only been recently globally adopted, several questions and difficulties
arose. Network operators need to ensure the reliability and the correct behavior
of their Virtualized Network Functions, which poses severe challenges. Thus, the
need for developing new validation tools, which are capable of validating network
functions that live in an NFV ecosystem. 5GASP is a European project which
aims to shorten the idea-to-market process by creating a fully automated and selfservice
5G testbed and providing support tools for Continuous Integration in a
secure and trusted environment, addressing the DevOps paradigm. Being aligned
with 5GASP’s goals, this dissertation mainly addresses the development of tools
to validate NetApps. To accomplish this, this document introduces two different
mechanisms for validating NetApps. The first tool is responsible for statically
validate the NetApps before they are deployed in 5GASP’s testbeds, being denominated
by NetApp Package Validator. Regarding this tool, during this document
the focus is its Descriptors Validator Module, which validates the NetApp descriptors
through syntactic, semantics, and reference validation and supports NetApps
developed according to different Information Models. The second tool comprises
an automated validation pipeline. This pipeline validates the functionality and the
behavior of the NetApps once they are deployed in a 5G-testbed. Besides, it collects
several metrics to enable a better understanding of the NetApp’s behavior.
Both tools are expected to be integrated with the 5GASP’s ecosystem. This document
presents the requirements definition, architecture, and implementation of
these tools and presents their results and outputs.Mestrado em Engenharia Informátic
- …