251 research outputs found

    Managing FAIR Tribological Data Using Kadi4Mat

    Get PDF
    The ever-increasing amount of data generated from experiments and simulations in engineering sciences is relying more and more on data science applications to generate new knowledge. Comprehensive metadata descriptions and a suitable research data infrastructure are essential prerequisites for these tasks. Experimental tribology, in particular, presents some unique challenges in this regard due to the interdisciplinary nature of the field and the lack of existing standards. In this work, we demonstrate the versatility of the open source research data infrastructure Kadi4Mat by managing and producing FAIR tribological data. As a showcase example, a tribological experiment is conducted by an experimental group with a focus on comprehensiveness. The result is a FAIR data package containing all produced data as well as machine- and user-readable metadata. The close collaboration between tribologists and software developers shows a practical bottom-up approach and how such infrastructures are an essential part of our FAIR digital future

    Composição de serviços para aplicações biomédicas

    Get PDF
    Doutoramento em Engenharia InformáticaA exigente inovação na área das aplicações biomédicas tem guiado a evolução das tecnologias de informação nas últimas décadas. Os desafios associados a uma gestão, integração, análise e interpretação eficientes dos dados provenientes das mais modernas tecnologias de hardware e software requerem um esforço concertado. Desde hardware para sequenciação de genes a registos electrónicos de paciente, passando por pesquisa de fármacos, a possibilidade de explorar com precisão os dados destes ambientes é vital para a compreensão da saúde humana. Esta tese engloba a discussão e o desenvolvimento de melhores estratégias informáticas para ultrapassar estes desafios, principalmente no contexto da composição de serviços, incluindo técnicas flexíveis de integração de dados, como warehousing ou federação, e técnicas avançadas de interoperabilidade, como serviços web ou LinkedData. A composição de serviços é apresentada como um ideal genérico, direcionado para a integração de dados e para a interoperabilidade de software. Relativamente a esta última, esta investigação debruçou-se sobre o campo da farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições para este projeto, um novo standard de interoperabilidade e um motor de execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma plataforma para realizar estudos avançados de farmacovigilância. No contexto do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os desafios associados à integração de dados distribuídos e heterogéneos no campo do varíoma humano. Foi criada uma nova solução, WAVe - Web Analyses of the Variome, que fornece uma coleção rica de dados de variação genética através de uma interface Web inovadora e de uma API avançada. O desenvolvimento destas estratégias evidenciou duas oportunidades claras na área de software biomédico: melhorar o processo de implementação de software através do recurso a técnicas de desenvolvimento rápidas e aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do paradigma de web semântica. A plataforma COEUS atravessa as fronteiras de integração e interoperabilidade, fornecendo metodologias para a aquisição e tradução flexíveis de dados, bem como uma camada de serviços interoperáveis para explorar semanticamente os dados agregados. Combinando as técnicas de desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a box", a plataforma COEUS é uma aproximação pioneira, permitindo o desenvolvimento da próxima geração de aplicações biomédicas.The demand for innovation in the biomedical software domain has been an information technologies evolution driver over the last decades. The challenges associated with the effective management, integration, analyses and interpretation of the wealth of life sciences information stemming from modern hardware and software technologies require concerted efforts. From gene sequencing hardware to pharmacology research up to patient electronic health records, the ability to accurately explore data from these environments is vital to further improve our understanding of human health. This thesis encloses the discussion on building better informatics strategies to address these challenges, primarily in the context of service composition, including warehousing and federation strategies for resource integration, as well as web services or LinkedData for software interoperability. Service composition is introduced as a general principle, geared towards data integration and software interoperability. Concerning the latter, this research covers the service composition requirements within the pharmacovigilance field, namely on the European EU-ADR project. The contributions to this area, the definition of a new interoperability standard and the creation of a new workflow-wrapping engine, are behind the successful construction of the EUADR Web Platform, a workspace for delivering advanced pharmacovigilance studies. In the context of the European GEN2PHEN project, this research tackles the challenges associated with the integration of heterogeneous and distributed data in the human variome field. For this matter, a new lightweight solution was created: WAVe, Web Analysis of the Variome, provides a rich collection of genetic variation data through an innovative portal and an advanced API. The development of the strategies underlying these products highlighted clear opportunities in the biomedical software field: enhancing the software implementation process with rapid application development approaches and improving the quality and availability of data with the adoption of the Semantic Web paradigm. COEUS crosses the boundaries of integration and interoperability as it provides a framework for the flexible acquisition and translation of data into a semantic knowledge base, as well as a comprehensive set of interoperability services, from REST to LinkedData, to fully exploit gathered data semantically. By combining the lightness of rapid application development strategies with the richness of its "Semantic Web in a box" approach, COEUS is a pioneering framework to enhance the development of the next generation of biomedical applications

    Towards Reproducible and Privacy-preserving Analyses Across Federated Repositories for Omics data

    Get PDF
    Even when duly anonymized, health research data has the potential to be disclosive and there- fore requires special safeguards according to the European General Data Protection Regulation (GDPR). Furthermore, the incorporation of FAIR principles (Findable, Accessible, Interoperable, Reusable) for a more favorable reuse of existing data, calls for an approach where sensitive data is kept locally and only metadata and aggregated results are shared. Additionally, since central pool- ing is discouraged by ethical, legal, and societal issues, it is more frequent to observe maturing data management frameworks, and platforms adopting the federated approach. Current implementations of privacy-preserving analysis frameworks seem to be limited when data becomes very large (millions of rows, hundreds of variables). Biological samples data, col- lected by high-throughput technologies, such as Next Generation Sequencing (NGS), which allows to sequence entire genomes, are examples of this kind of data. The term "genomics" refers to the field of science that studies genomes. The Omics tech- nologies intend to produce a systematic identification of all mRNA (transcriptomics), proteins (proteomics), and metabolites (metabolomics), respectively, present in a given biological sample. In the particular case of Omics data, these data are produced by computational workflows known as bioinformatics pipelines. The reproducibility of these pipelines is hard and it is often underestimated. Nevertheless, it is important to generate trust in scientific results, and therefore, is fundamental to know how these Omics data were generated or obtained. This work will leverage on the promising results of current open-source implementations for distributed privacy-preserving analyses, while aiming at generalizing the approach and addressing some of their shortcomings. To enable the privacy-preserving analysis of Omics data, we introduced the "resource" con- cept, implemented in one of the studied solutions. The results were promising, seeing that the privacy-preserving analysis was effective when us- ing the DataSHIELD framework in conjunction with the "resource R" package. We also concluded that the adoption of specialized DataSHIELD packages for Omics analyses is a viable pathway to leverage the privacy-preserving for Omics data. To address the reproducibility challenges, we defined a database model to represent the steps, commands and operations executed by the bioinformatics pipelines. The database model is promising, but to accomplish all reproducibility requirements, including container support and integration with code sharing platforms, it is necessary to use other tools, such as Nextflow or Snakemake, with dozens of other tested and mature functions.Even when duly anonymized, health research data has the potential to be disclosive and there- fore requires special safeguards according to the European General Data Protection Regulation (GDPR). Furthermore, the incorporation of FAIR principles (Findable, Accessible, Interoperable, Reusable) for a more favorable reuse of existing data, calls for an approach where sensitive data is kept locally and only metadata and aggregated results are shared. Additionally, since central pool- ing is discouraged by ethical, legal, and societal issues, it is more frequent to observe maturing data management frameworks, and platforms adopting the federated approach. Current implementations of privacy-preserving analysis frameworks seem to be limited when data becomes very large (millions of rows, hundreds of variables). Biological samples data, col- lected by high-throughput technologies, such as Next Generation Sequencing (NGS), which allows to sequence entire genomes, are examples of this kind of data. The term "genomics" refers to the field of science that studies genomes. The Omics tech- nologies intend to produce a systematic identification of all mRNA (transcriptomics), proteins (proteomics), and metabolites (metabolomics), respectively, present in a given biological sample. In the particular case of Omics data, these data are produced by computational workflows known as bioinformatics pipelines. The reproducibility of these pipelines is hard and it is often underestimated. Nevertheless, it is important to generate trust in scientific results, and therefore, is fundamental to know how these Omics data were generated or obtained. This work will leverage on the promising results of current open-source implementations for distributed privacy-preserving analyses, while aiming at generalizing the approach and addressing some of their shortcomings. To enable the privacy-preserving analysis of Omics data, we introduced the "resource" con- cept, implemented in one of the studied solutions. The results were promising, seeing that the privacy-preserving analysis was effective when us- ing the DataSHIELD framework in conjunction with the "resource R" package. We also concluded that the adoption of specialized DataSHIELD packages for Omics analyses is a viable pathway to leverage the privacy-preserving for Omics data. To address the reproducibility challenges, we defined a database model to represent the steps, commands and operations executed by the bioinformatics pipelines. The database model is promising, but to accomplish all reproducibility requirements, including container support and integration with code sharing platforms, it is necessary to use other tools, such as Nextflow or Snakemake, with dozens of other tested and mature functions

    Linked Data based Health Information Representation, Visualization and Retrieval System on the Semantic Web

    Get PDF
    Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.To better facilitate health information dissemination, using flexible ways to represent, query and visualize health data becomes increasingly important. Semantic Web technologies, which provide a common framework by allowing data to be shared and reused between applications, can be applied to the management of health data. Linked open data - a new semantic web standard to publish and link heterogonous data- allows not only human, but also machine to brows data in unlimited way. Through a use case of world health organization HIV data of sub Saharan Africa - which is severely affected by HIV epidemic, this thesis built a linked data based health information representation, querying and visualization system. All the data was represented with RDF, by interlinking it with other related datasets, which are already on the cloud. Over all, the system have more than 21,000 triples with a SPARQL endpoint; where users can download and use the data and – a SPARQL query interface where users can put different type of query and retrieve the result. Additionally, It has also a visualization interface where users can visualize the SPARQL result with a tool of their preference. For users who are not familiar with SPARQL queries, they can use the linked data search engine interface to search and browse the data. From this system we can depict that current linked open data technologies have a big potential to represent heterogonous health data in a flexible and reusable manner and they can serve in intelligent queries, which can support decision-making. However, in order to get the best from these technologies, improvements are needed both at the level of triple stores performance and domain-specific ontological vocabularies

    Framework de planeamento de missões para frotas de drones interligados

    Get PDF
    The usage of aerial drones has become more popular as they also become more accessible, both in economic and usability terms. Nowadays, these vehicles can present reduced dimensions and a good cost-benefit ratio, which makes it possible for several services and applications supported by aerial drone networks to emerge. Some scenarios that benefit from the use of aerial drones are the monitoring of emergency situations and natural disasters, the patrolling of urban areas and support to police forces, and tourist applications such as the real-time video transmission of points of interest. It is common for the control of the drone to be dependent on human intervention in these situations, which requires professionals specialized in its control. However, in recent years, several solutions have emerged that enable the autonomous flight of these vehicles, minimizing manual interference. Taking into account the enormous diversity of use cases, many of the existing solutions for autonomous control focus on specific scenarios. Generic mission planning platforms also exist, but most of them only allow missions consisting of linear waypoints to be traversed. These situations translate into a mission support that is not very flexible. In this dissertation, we propose a modular infrastructure that can be used in various scenarios, enabling the autonomous control and monitoring of a fleet of aerial drones in a mission context. This platform has two main components, one integrated into the onboard computer of the vehicle, and the other one in the ground control. The former allows the communication with the flight controller so that it can collect telemetry data and send movement instructions to the drone. The latter allows to monitor this data and send the commands remotely, also enabling robust mission planning with multiple drones. A mission can be described in a script that the ground module interprets, sending the commands to the assigned vehicles. These missions can describe different paths, modifying the behaviour of the drones according to external factors, such as a sensor reading. It is also possible to define plugins to be reused in various missions, for example, by integrating an algorithm that ensures that all drones maintain connectivity. The solution was evaluated in scenarios with a single drone and with the collaboration of multiple drones. The tests were performed in a simulated environment and also in an environment with real drones. The observed behaviour is similar in both scenarios.A utilização de drones aéreos tem-se vindo a popularizar à medida que estes se tornam mais acessíveis, quer em termos económicos quer em usabilidade. Atualmente, estes veículos são capazes de apresentar dimensões reduzidas e uma boa relação de custo-benefício, o que potencia que diversos serviços e aplicações suportados por redes de drones aéreos estejam a emergir. Alguns cenários que beneficiam da utilização de drones aéreos são a monitorização de situações de emergência e catástrofes naturais, a patrulha de áreas urbanas e apoio às forças policiais e aplicações turísticas como a transmissão de vídeo em tempo real de pontos de interesse. É comum que o controlo do drone esteja dependente de intervenção humana nestas situações, o que requer profissionais especializados no seu controlo. No entanto, nos últimos anos têm surgido diversas soluções que possibilitam o vôo autónomo destes veículos, minimizando a interferência manual. Perante a enorme diversidade de casos de aplicação, muitas das soluções existentes para o controlo autónomo focam-se em cenários específicos de intervenção. Existem também plataformas de planeamento genérico de missões, mas que na sua maioria apenas permitem missões constituídas por conjuntos lineares de pontos a ser percorridos. Estas situações traduzem-se num suporte a missões que é pouco flexível. Nesta dissertação propomos uma infraestrutura modular passível de ser utilizada em cenários variados, possibilitando o controlo autónomo de uma frota de drones aéreos num contexto de missão e a sua monitorização. Esta plataforma tem dois componentes principais, um integrado no computador a bordo do veículo e o outro no controlo terrestre. O primeiro permite a comunicação com o controlador de vôo para que se possa recolher diversos dados de telemetria e enviar instruções de movimento para o drone. O segundo permite monitorizar esses dados e enviar os comandos remotamente, possibilitando também um planeamento robusto de missões com múltiplos drones. Uma missão pode ser descrita num script que o módulo terrestre interpreta, enviando os comandos para os veículos atribuídos. Estas missões podem descrever diversos caminhos, modificando o comportamento dos drones de acordo com factores externos, como a leitura de um sensor. Também é possível definir plugins para serem reutilizados em várias missões, como por exemplo, integrando um algoritmo que garante que todos os drones mantêm a conectividade. A solução foi avaliada em cenários com um único drone e com a colaboração de múltiplos drones. Os testes foram executados em ambiente simulado e também num ambiente com drones reais. O comportamento observado nas missões é semelhante em ambos os cenários.Mestrado em Engenharia de Computadores e Telemátic

    Analysis Of Aircraft Arrival Delay And Airport On-time Performance

    Get PDF
    While existing grid environments cater to specific needs of a particular user community, we need to go beyond them and consider general-purpose large-scale distributed systems consisting of large collections of heterogeneous computers and communication systems shared by a large user population with very diverse requirements. Coordination, matchmaking, and resource allocation are among the essential functions of large-scale distributed systems. Although deterministic approaches for coordination, matchmaking, and resource allocation have been well studied, they are not suitable for large-scale distributed systems due to the large-scale, the autonomy, and the dynamics of the systems. We have to seek for nondeterministic solutions for large-scale distributed systems. In this dissertation we describe our work on a coordination service, a matchmaking service, and a macro-economic resource allocation model for large-scale distributed systems. The coordination service coordinates the execution of complex tasks in a dynamic environment, the matchmaking service supports finding the appropriate resources for users, and the macro-economic resource allocation model allows a broker to mediate resource providers who want to maximize their revenues and resource consumers who want to get the best resources at the lowest possible price, with some global objectives, e.g., to maximize the resource utilization of the system

    Distributed Management of Grid-based Scientific Workflows

    Get PDF
    Grids and service-oriented technologies are emerging as dominant approaches for distributed systems. With the evolution of these technologies, scientific workflows have been introduced as a tool for scientists to assemble highly specialized applications, and to exchange large heterogeneous datasets in order to automate and accelerate the accomplishment of complex scientific tasks. Several Scientific Workflow Management Systems (SWfMS) have already been designed to support the specification, execution, and monitoring of scientific workflows. Meanwhile, they still face key challenges from two different perspectives: system usability and system efficiency. From the system usability perspective, current SWfMS are not designed to be simple enough for scientists who have quite limited IT knowledge. What’s more, there is no easy mechanism by which scientists can share and re-use scientific experiments that have already been designed and proved by others. From the perspective of system efficiency, existing SWfMS are coordinating and executing workflows in a centralized fashion using a single scheduler and / or a workflow enactor. This creates a single point of failure, forms a scalability bottleneck, and enforces centralized fault handling. In addition, they don’t consider load balancing while mapping abstract jobs onto several computational nodes. Another important challenge exists due to the common nature of scientific workflow applications, that need to exchange a huge amount of data during the execution process. Some available SWfMS use a mediator-based approach for data transfer where data must be transferred first to a centralized data manager, which is completely inefficient. Other SWfMS apply a peer-to-peer approach via data references. Even this approach is not sufficient for scientific workflows as a single complex scientific activity can produce an extensive amount of data. In this thesis, we introduce SWIMS (Scientific Workflow Integration and Management System) framework. It employs the Web Services technology to originate a distributed management system for data-intensive scientific workflows. The purpose of SWIMS is to overcome the previously mentioned challenges through a set of salient features: i) Support for distributed execution and management of workflows, ii) diminution of communication traffic, iii) support for smart re-run, iv) distributed fault handling and load balancing, v) ease of use, and vi) extensive sharing of scientific workflows. We discuss the motivation, design, and implementation of the SWIMS framework. Then, we evaluate it through the Montage application from the astronomy domain

    Remote service provision for connected homes.

    Get PDF
    This research study proposed to view a remote service delivery system from three distinct perspectives: connected home environments (user perspective), remote service delivery platform (service enabler), and remote service providers (service provider perspective); to establish a holistic view on the requirements of remote service provision to connected home environments. A reference architecture for remote service provision based on the proposed views has been devised, which provides built-in support for an “On-Demand” operating model and facilitate “Freedom of Choice” via different levels of interoperability

    Integração contínua no 5GASP

    Get PDF
    The wide adoption of an NFV-oriented paradigm by network operators proves the importance of NFV in the future of communication networks. This paradigm allows network operators to speed up the development process of their services, decoupling hardware from the functionalities provided by these services. However, since NFV has only been recently globally adopted, several questions and difficulties arose. Network operators need to ensure the reliability and the correct behavior of their Virtualized Network Functions, which poses severe challenges. Thus, the need for developing new validation tools, which are capable of validating network functions that live in an NFV ecosystem. 5GASP is a European project which aims to shorten the idea-to-market process by creating a fully automated and selfservice 5G testbed and providing support tools for Continuous Integration in a secure and trusted environment, addressing the DevOps paradigm. Being aligned with 5GASP’s goals, this dissertation mainly addresses the development of tools to validate NetApps. To accomplish this, this document introduces two different mechanisms for validating NetApps. The first tool is responsible for statically validate the NetApps before they are deployed in 5GASP’s testbeds, being denominated by NetApp Package Validator. Regarding this tool, during this document the focus is its Descriptors Validator Module, which validates the NetApp descriptors through syntactic, semantics, and reference validation and supports NetApps developed according to different Information Models. The second tool comprises an automated validation pipeline. This pipeline validates the functionality and the behavior of the NetApps once they are deployed in a 5G-testbed. Besides, it collects several metrics to enable a better understanding of the NetApp’s behavior. Both tools are expected to be integrated with the 5GASP’s ecosystem. This document presents the requirements definition, architecture, and implementation of these tools and presents their results and outputs.The wide adoption of an NFV-oriented paradigm by network operators proves the importance of NFV in the future of communication networks. This paradigm allows network operators to speed up the development process of their services, decoupling hardware from the functionalities provided by these services. However, since NFV has only been recently globally adopted, several questions and difficulties arose. Network operators need to ensure the reliability and the correct behavior of their Virtualized Network Functions, which poses severe challenges. Thus, the need for developing new validation tools, which are capable of validating network functions that live in an NFV ecosystem. 5GASP is a European project which aims to shorten the idea-to-market process by creating a fully automated and selfservice 5G testbed and providing support tools for Continuous Integration in a secure and trusted environment, addressing the DevOps paradigm. Being aligned with 5GASP’s goals, this dissertation mainly addresses the development of tools to validate NetApps. To accomplish this, this document introduces two different mechanisms for validating NetApps. The first tool is responsible for statically validate the NetApps before they are deployed in 5GASP’s testbeds, being denominated by NetApp Package Validator. Regarding this tool, during this document the focus is its Descriptors Validator Module, which validates the NetApp descriptors through syntactic, semantics, and reference validation and supports NetApps developed according to different Information Models. The second tool comprises an automated validation pipeline. This pipeline validates the functionality and the behavior of the NetApps once they are deployed in a 5G-testbed. Besides, it collects several metrics to enable a better understanding of the NetApp’s behavior. Both tools are expected to be integrated with the 5GASP’s ecosystem. This document presents the requirements definition, architecture, and implementation of these tools and presents their results and outputs.Mestrado em Engenharia Informátic
    corecore