17 research outputs found

    Covid-on-the-Web: Knowledge Graph and Services to Advance COVID-19 Research

    Get PDF
    International audienceScientists are harnessing their multidisciplinary expertise and resources to fight the COVID-19 pandemic. Aligned with this mind-set, the Covid-on-the-Web project aims to allow biomedical researchers to access, query and make sense of COVID-19 related literature. To do so, it adapts, combines and extends tools to process, analyze and enrich the "COVID-19 Open Research Dataset" (CORD-19) that gathers 50,000+ full-text scientific articles related to the coronaviruses. We report on the RDF dataset and software resources produced in this project by leveraging skills in knowledge representation, text, data and argument mining, as well as data visualization and exploration. The dataset comprises two main knowledge graphs describing (1) named entities mentioned in the CORD-19 corpus and linked to DBpedia, Wikidata and other BioPortal vocabularies, and (2) arguments extracted using ACTA, a tool automating the extraction and visualization of argumentative graphs, meant to help clinicians analyze clinical trials and make decisions. On top of this dataset, we provide several visualization and exploration tools based on the Corese Semantic Web platform, MGExplorer visualization library, as well as the Jupyter Notebook technology. All along this initiative, we have been engaged in discussions with healthcare and medical research institutes to align our approach with the actual needs of the biomedical community, and we have paid particular attention to comply with the open and reproducible science goals, and the FAIR principles

    Artificial Intelligence methodologies to early predict student outcome and enrich learning material

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Sistemas interativos e distribuídos para telemedicina

    Get PDF
    doutoramento Ciências da ComputaçãoDurante as últimas décadas, as organizações de saúde têm vindo a adotar continuadamente as tecnologias de informação para melhorar o funcionamento dos seus serviços. Recentemente, em parte devido à crise financeira, algumas reformas no sector de saúde incentivaram o aparecimento de novas soluções de telemedicina para otimizar a utilização de recursos humanos e de equipamentos. Algumas tecnologias como a computação em nuvem, a computação móvel e os sistemas Web, têm sido importantes para o sucesso destas novas aplicações de telemedicina. As funcionalidades emergentes de computação distribuída facilitam a ligação de comunidades médicas, promovem serviços de telemedicina e a colaboração em tempo real. Também são evidentes algumas vantagens que os dispositivos móveis podem introduzir, tais como facilitar o trabalho remoto a qualquer hora e em qualquer lugar. Por outro lado, muitas funcionalidades que se tornaram comuns nas redes sociais, tais como a partilha de dados, a troca de mensagens, os fóruns de discussão e a videoconferência, têm o potencial para promover a colaboração no sector da saúde. Esta tese teve como objetivo principal investigar soluções computacionais mais ágeis que permitam promover a partilha de dados clínicos e facilitar a criação de fluxos de trabalho colaborativos em radiologia. Através da exploração das atuais tecnologias Web e de computação móvel, concebemos uma solução ubíqua para a visualização de imagens médicas e desenvolvemos um sistema colaborativo para a área de radiologia, baseado na tecnologia da computação em nuvem. Neste percurso, foram investigadas metodologias de mineração de texto, de representação semântica e de recuperação de informação baseada no conteúdo da imagem. Para garantir a privacidade dos pacientes e agilizar o processo de partilha de dados em ambientes colaborativos, propomos ainda uma metodologia que usa aprendizagem automática para anonimizar as imagens médicasDuring the last decades, healthcare organizations have been increasingly relying on information technologies to improve their services. At the same time, the optimization of resources, both professionals and equipment, have promoted the emergence of telemedicine solutions. Some technologies including cloud computing, mobile computing, web systems and distributed computing can be used to facilitate the creation of medical communities, and the promotion of telemedicine services and real-time collaboration. On the other hand, many features that have become commonplace in social networks, such as data sharing, message exchange, discussion forums, and a videoconference, have also the potential to foster collaboration in the health sector. The main objective of this research work was to investigate computational solutions that allow us to promote the sharing of clinical data and to facilitate the creation of collaborative workflows in radiology. By exploring computing and mobile computing technologies, we have designed a solution for medical imaging visualization, and developed a collaborative system for radiology, based on cloud computing technology. To extract more information from data, we investigated several methodologies such as text mining, semantic representation, content-based information retrieval. Finally, to ensure patient privacy and to streamline the data sharing in collaborative environments, we propose a machine learning methodology to anonymize medical images

    Taming web data : exploiting linked data for integrating medical educational content

    Get PDF
    Open data are playing a vital role in different communities, including governments, businesses, and education. This revolution has had a high impact on the education field. Recently, new practices are being adopted for publishing and connecting data on the web, known as "Linked Data", and these are used to expose and connect data which were not previously linked. In the context of education, applying Linked Data practices to the growing amount of open data used for learning is potentially highly beneficial. The work presented in this thesis tackles the challenges of data acquisition and integration from distributed web data sources into one linked dataset. The application domain of this thesis is medical education, and the focus is on bridging the gap between articles published in online educational libraries and content published on Web 2.0 platforms that can be used for education. The integration of a collection of heterogeneous resources is to create links between data collected from distributed web data sources. To address these challenges, a system is proposed that exploits the Linked Data for building a metadata schema in XML/RDF format for describing resources and enriching it with external dataset that adds semantic to its metadata. The proposed system collects resources from distributed data sources on the web and enriches their metadata with concepts from biomedical ontologies, such as SNOMED CT, that enable its linking. The final result of building this system is a linked dataset of more than 10,000 resources collected from PubMed Library, YouTube channels, and Blogging platforms. The effectiveness of the system proposed is evaluated by validating the content of the linked dataset when accessed and retrieved. Ontology-based techniques have been developed for browsing and querying the linked dataset resulting from the system proposed. Experiments have been conducted to simulate users' access to the linked dataset and validate its content. The results were promising and have shown the effectiveness of using SNOMED CT for integrating distributed resources from diverse web data sources

    Mineração de informação biomédica a partir de literatura científica

    Get PDF
    Doutoramento conjunto MAP-iThe rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.A rápida evolução e proliferação de uma rede mundial de computadores, a Internet, resultou num esmagador e constante crescimento na quantidade de dados e informação publicamente disponíveis, o que também se verificou na biomedicina. No entanto, a inexistência de estrutura em dados textuais inibe o seu processamento direto por parte de soluções informatizadas. Extração de informação é a tarefa de mineração de texto que pretende extrair automaticamente informação de fontes de dados de texto não estruturados. O objetivo do trabalho descrito nesta tese foi essencialmente focado em construir soluções inovadoras para extração de informação biomédica a partir da literatura científica, através do desenvolvimento de aplicações simples de usar por programadores e bio-curadores, capazes de fornecer resultados mais precisos, usáveis e de forma mais rápida. Começámos por abordar o reconhecimento de nomes de conceitos - uma tarefa inicial e fundamental - com o desenvolvimento de Gimli, uma solução baseada em inteligência artificial que aplica uma estratégia incremental para otimizar as características linguísticas extraídas do texto para cada tipo de conceito. Posteriormente, Totum foi implementado para harmonizar nomes de conceitos provenientes de sistemas heterogéneos, oferecendo uma solução mais robusta e com melhores resultados. Esta aproximação recorre a informação contida em corpora heterogéneos para disponibilizar uma solução não restrita às característica de um único corpus. Uma vez que as soluções anteriores não oferecem ligação dos nomes a bases de conhecimento, Neji foi construído para facilitar o desenvolvimento de soluções complexas e personalizadas para o reconhecimento de conceitos nomeados e respectiva normalização. Isto foi conseguido através de uma plataforma modular e flexível focada em rapidez e desempenho, integrando um vasto conjunto de módulos de processamento optimizados para o domínio biomédico. De forma a disponibilizar identificação de conceitos biomédicos em tempo real, BeCAS foi desenvolvido para oferecer um serviço, aplicação e widget Web. A extracção de relações entre conceitos também foi abordada através do desenvolvimento de TrigNER, uma solução baseada em inteligência artificial para o reconhecimento de palavras que desencadeiam a ocorrência de eventos biomédicos. Esta ferramenta aplica um algoritmo automático para encontrar as melhores características linguísticas e parâmetros para cada tipo de evento. Finalmente, de forma a auxiliar o trabalho de bio-curadores, Egas foi desenvolvido para suportar a anotação rápida, interactiva e colaborativa em tempo real de documentos biomédicos, através da anotação manual e automática de conceitos e relações de forma contextualizada. Resumindo, este trabalho contribuiu para a actualização mais precisa das actuais bases de conhecimento, auxiliando a formulação de hipóteses e a descoberta de novo conhecimento

    Contributions to information extraction for spanish written biomedical text

    Get PDF
    285 p.Healthcare practice and clinical research produce vast amounts of digitised, unstructured data in multiple languages that are currently underexploited, despite their potential applications in improving healthcare experiences, supporting trainee education, or enabling biomedical research, for example. To automatically transform those contents into relevant, structured information, advanced Natural Language Processing (NLP) mechanisms are required. In NLP, this task is known as Information Extraction. Our work takes place within this growing field of clinical NLP for the Spanish language, as we tackle three distinct problems. First, we compare several supervised machine learning approaches to the problem of sensitive data detection and classification. Specifically, we study the different approaches and their transferability in two corpora, one synthetic and the other authentic. Second, we present and evaluate UMLSmapper, a knowledge-intensive system for biomedical term identification based on the UMLS Metathesaurus. This system recognises and codifies terms without relying on annotated data nor external Named Entity Recognition tools. Although technically naive, it performs on par with more evolved systems, and does not exhibit a considerable deviation from other approaches that rely on oracle terms. Finally, we present and exploit a new corpus of real health records manually annotated with negation and uncertainty information: NUBes. This corpus is the basis for two sets of experiments, one on cue andscope detection, and the other on assertion classification. Throughout the thesis, we apply and compare techniques of varying levels of sophistication and novelty, which reflects the rapid advancement of the field

    User-centered semantic dataset retrieval

    Get PDF
    Finding relevant research data is an increasingly important but time-consuming task in daily research practice. Several studies report on difficulties in dataset search, e.g., scholars retrieve only partial pertinent data, and important information can not be displayed in the user interface. Overcoming these problems has motivated a number of research efforts in computer science, such as text mining and semantic search. In particular, the emergence of the Semantic Web opens a variety of novel research perspectives. Motivated by these challenges, the overall aim of this work is to analyze the current obstacles in dataset search and to propose and develop a novel semantic dataset search. The studied domain is biodiversity research, a domain that explores the diversity of life, habitats and ecosystems. This thesis has three main contributions: (1) We evaluate the current situation in dataset search in a user study, and we compare a semantic search with a classical keyword search to explore the suitability of semantic web technologies for dataset search. (2) We generate a question corpus and develop an information model to figure out on what scientific topics scholars in biodiversity research are interested in. Moreover, we also analyze the gap between current metadata and scholarly search interests, and we explore whether metadata and user interests match. (3) We propose and develop an improved dataset search based on three components: (A) a text mining pipeline, enriching metadata and queries with semantic categories and URIs, (B) a retrieval component with a semantic index over categories and URIs and (C) a user interface that enables a search within categories and a search including further hierarchical relations. Following user centered design principles, we ensure user involvement in various user studies during the development process

    Development and Evaluation of a Holistic, Cloud-driven and Microservices-based Architecture for Automated Semantic Annotation of Web Documents

    Get PDF
    The Semantic Web is based on the concept of representing information on the web such that computers can both understand and process them. This implies defining context for web information to give them a well-defined meaning. Semantic Annotation defines the process of adding annotation data to web information for the much-needed context. However, despite several solutions and techniques for semantic annotation, it is still faced with challenges which have hindered the growth of the semantic web. With recent significant technological innovations such as Cloud Computing, Internet of Things as well as Mobile Computing and their various integrations with semantic technologies to proffer solutions in IT, little has been done towards leveraging these technologies to address semantic annotation challenges. Hence, this research investigates leveraging cloud computing paradigm to address some semantic annotation challenges, with focus on an automated system for providing semantic annotation as a service. Firstly, considering the current disparate nature observable with most semantic annotation solutions, a holistic perspective to semantic annotation is proposed based on a set of requirements. Then, a capability assessment towards the feasibility of leveraging cloud computing is conducted which produces a Cloud Computing Capability Model for Holistic Semantic Annotation. Furthermore, an investigation into application deployment patterns in the cloud and how they relate to holistic semantic annotation was conducted. A set of determinant factors that define different patterns for application deployment in the cloud were identified and these resulted into the development of a Cloud Computing Maturity Model and the conceptualisation of a “Cloud-Driven” development methodology for holistic semantic annotation in the cloud. Some key components of the “Cloud-Driven” concept include Microservices, Operating System-Level Virtualisation and Orchestration. With the role Microservices Software Architectural Patterns play towards developing solutions that can fully maximise cloud computing benefits; CloudSea: a holistic, cloud-driven and microservices-based architecture for automated semantic annotation of web documents is proposed as a novel approach to semantic annotation. The architecture draws from the theory of “Design Patterns” in Software Engineering towards its design and development which subsequently resulted into the development of twelve Design Patterns and a Pattern Language for Holistic Semantic Annotation, based on the CloudSea architectural design. As proof-of-concept, a prototype implementation for CloudSea was developed and deployed in the cloud based on the “Cloud-Driven” methodology and a functionality evaluation was carried out on it. A comparative evaluation of the CloudSea architecture was also conducted in relation to current semantic annotation solutions; both proposed in academic literature and existing as industry solutions. In addition, to evaluate the proposed Cloud Computing Maturity Model for Holistic Semantic Annotation, an experimental evaluation of the model was conducted by developing and deploying six instances of the prototype and deploying them differently, based on the patterns described in the model. This empirical investigation was implemented by testing the instances for performance through series of API load tests and results obtained confirmed the validity of both the “Cloud-Driven” methodology and the entire model

    Development and Evaluation of a Holistic, Cloud-driven and Microservices-based Architecture for Automated Semantic Annotation of Web Documents

    Get PDF
    The Semantic Web is based on the concept of representing information on the web such that computers can both understand and process them. This implies defining context for web information to give them a well-defined meaning. Semantic Annotation defines the process of adding annotation data to web information for the much-needed context. However, despite several solutions and techniques for semantic annotation, it is still faced with challenges which have hindered the growth of the semantic web. With recent significant technological innovations such as Cloud Computing, Internet of Things as well as Mobile Computing and their various integrations with semantic technologies to proffer solutions in IT, little has been done towards leveraging these technologies to address semantic annotation challenges. Hence, this research investigates leveraging cloud computing paradigm to address some semantic annotation challenges, with focus on an automated system for providing semantic annotation as a service. Firstly, considering the current disparate nature observable with most semantic annotation solutions, a holistic perspective to semantic annotation is proposed based on a set of requirements. Then, a capability assessment towards the feasibility of leveraging cloud computing is conducted which produces a Cloud Computing Capability Model for Holistic Semantic Annotation. Furthermore, an investigation into application deployment patterns in the cloud and how they relate to holistic semantic annotation was conducted. A set of determinant factors that define different patterns for application deployment in the cloud were identified and these resulted into the development of a Cloud Computing Maturity Model and the conceptualisation of a “Cloud-Driven” development methodology for holistic semantic annotation in the cloud. Some key components of the “Cloud-Driven” concept include Microservices, Operating System-Level Virtualisation and Orchestration. With the role Microservices Software Architectural Patterns play towards developing solutions that can fully maximise cloud computing benefits; CloudSea: a holistic, cloud-driven and microservices-based architecture for automated semantic annotation of web documents is proposed as a novel approach to semantic annotation. The architecture draws from the theory of “Design Patterns” in Software Engineering towards its design and development which subsequently resulted into the development of twelve Design Patterns and a Pattern Language for Holistic Semantic Annotation, based on the CloudSea architectural design. As proof-of-concept, a prototype implementation for CloudSea was developed and deployed in the cloud based on the “Cloud-Driven” methodology and a functionality evaluation was carried out on it. A comparative evaluation of the CloudSea architecture was also conducted in relation to current semantic annotation solutions; both proposed in academic literature and existing as industry solutions. In addition, to evaluate the proposed Cloud Computing Maturity Model for Holistic Semantic Annotation, an experimental evaluation of the model was conducted by developing and deploying six instances of the prototype and deploying them differently, based on the patterns described in the model. This empirical investigation was implemented by testing the instances for performance through series of API load tests and results obtained confirmed the validity of both the “Cloud-Driven” methodology and the entire model
    corecore