1,295 research outputs found

    An ontology to standardize research output of nutritional epidemiology : from paper-based standards to linked content

    Get PDF
    Background: The use of linked data in the Semantic Web is a promising approach to add value to nutrition research. An ontology, which defines the logical relationships between well-defined taxonomic terms, enables linking and harmonizing research output. To enable the description of domain-specific output in nutritional epidemiology, we propose the Ontology for Nutritional Epidemiology (ONE) according to authoritative guidance for nutritional epidemiology. Methods: Firstly, a scoping review was conducted to identify existing ontology terms for reuse in ONE. Secondly, existing data standards and reporting guidelines for nutritional epidemiology were converted into an ontology. The terms used in the standards were summarized and listed separately in a taxonomic hierarchy. Thirdly, the ontologies of the nutritional epidemiologic standards, reporting guidelines, and the core concepts were gathered in ONE. Three case studies were included to illustrate potential applications: (i) annotation of existing manuscripts and data, (ii) ontology-based inference, and (iii) estimation of reporting completeness in a sample of nine manuscripts. Results: Ontologies for food and nutrition (n = 37), disease and specific population (n = 100), data description (n = 21), research description (n = 35), and supplementary (meta) data description (n = 44) were reviewed and listed. ONE consists of 339 classes: 79 new classes to describe data and 24 new classes to describe the content of manuscripts. Conclusion: ONE is a resource to automate data integration, searching, and browsing, and can be used to assess reporting completeness in nutritional epidemiology

    Building a semantically annotated corpus of clinical texts

    Get PDF
    In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

    AGUIA: autonomous graphical user interface assembly for clinical trials semantic data services

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>AGUIA is a front-end web application originally developed to manage clinical, demographic and biomolecular patient data collected during clinical trials at MD Anderson Cancer Center. The diversity of methods involved in patient screening and sample processing generates a variety of data types that require a resource-oriented architecture to capture the associations between the heterogeneous data elements. AGUIA uses a semantic web formalism, resource description framework (RDF), and a bottom-up design of knowledge bases that employ the S3DB tool as the starting point for the client's interface assembly.</p> <p>Methods</p> <p>The data web service, S3DB, meets the necessary requirements of generating the RDF and of explicitly distinguishing the description of the domain from its instantiation, while allowing for continuous editing of both. Furthermore, it uses an HTTP-REST protocol, has a SPARQL endpoint, and has open source availability in the public domain, which facilitates the development and dissemination of this application. However, S3DB alone does not address the issue of representing content in a form that makes sense for domain experts.</p> <p>Results</p> <p>We identified an autonomous set of descriptors, the GBox, that provides user and domain specifications for the graphical user interface. This was achieved by identifying a formalism that makes use of an RDF schema to enable the automatic assembly of graphical user interfaces in a meaningful manner while using only resources native to the client web browser (JavaScript interpreter, document object model). We defined a generalized RDF model such that changes in the graphic descriptors are automatically and immediately (locally) reflected into the configuration of the client's interface application.</p> <p>Conclusions</p> <p>The design patterns identified for the GBox benefit from and reflect the specific requirements of interacting with data generated by clinical trials, and they contain clues for a general purpose solution to the challenge of having interfaces automatically assembled for multiple and volatile views of a domain. By coding AGUIA in JavaScript, for which all browsers include a native interpreter, a solution was found that assembles interfaces that are meaningful to the particular user, and which are also ubiquitous and lightweight, allowing the computational load to be carried by the client's machine.</p

    Conceptual graph-based knowledge representation for supporting reasoning in African traditional medicine

    Get PDF
    Although African patients use both conventional or modern and traditional healthcare simultaneously, it has been proven that 80% of people rely on African traditional medicine (ATM). ATM includes medical activities stemming from practices, customs and traditions which were integral to the distinctive African cultures. It is based mainly on the oral transfer of knowledge, with the risk of losing critical knowledge. Moreover, practices differ according to the regions and the availability of medicinal plants. Therefore, it is necessary to compile tacit, disseminated and complex knowledge from various Tradi-Practitioners (TP) in order to determine interesting patterns for treating a given disease. Knowledge engineering methods for traditional medicine are useful to model suitably complex information needs, formalize knowledge of domain experts and highlight the effective practices for their integration to conventional medicine. The work described in this paper presents an approach which addresses two issues. First it aims at proposing a formal representation model of ATM knowledge and practices to facilitate their sharing and reusing. Then, it aims at providing a visual reasoning mechanism for selecting best available procedures and medicinal plants to treat diseases. The approach is based on the use of the Delphi method for capturing knowledge from various experts which necessitate reaching a consensus. Conceptual graph formalism is used to model ATM knowledge with visual reasoning capabilities and processes. The nested conceptual graphs are used to visually express the semantic meaning of Computational Tree Logic (CTL) constructs that are useful for formal specification of temporal properties of ATM domain knowledge. Our approach presents the advantage of mitigating knowledge loss with conceptual development assistance to improve the quality of ATM care (medical diagnosis and therapeutics), but also patient safety (drug monitoring)

    Advanced Methods for Entity Linking in the Life Sciences

    Get PDF
    The amount of knowledge increases rapidly due to the increasing number of available data sources. However, the autonomy of data sources and the resulting heterogeneity prevent comprehensive data analysis and applications. Data integration aims to overcome heterogeneity by unifying different data sources and enriching unstructured data. The enrichment of data consists of different subtasks, amongst other the annotation process. The annotation process links document phrases to terms of a standardized vocabulary. Annotated documents enable effective retrieval methods, comparability of different documents, and comprehensive data analysis, such as finding adversarial drug effects based on patient data. A vocabulary allows the comparability using standardized terms. An ontology can also represent a vocabulary, whereas concepts, relationships, and logical constraints additionally define an ontology. The annotation process is applicable in different domains. Nevertheless, there is a difference between generic and specialized domains according to the annotation process. This thesis emphasizes the differences between the domains and addresses the identified challenges. The majority of annotation approaches focuses on the evaluation of general domains, such as Wikipedia. This thesis evaluates the developed annotation approaches with case report forms that are medical documents for examining clinical trials. The natural language provides different challenges, such as similar meanings using different phrases. The proposed annotation method, AnnoMap, considers the fuzziness of natural language. A further challenge is the reuse of verified annotations. Existing annotations represent knowledge that can be reused for further annotation processes. AnnoMap consists of a reuse strategy that utilizes verified annotations to link new documents to appropriate concepts. Due to the broad spectrum of areas in the biomedical domain, different tools exist. The tools perform differently regarding a particular domain. This thesis proposes a combination approach to unify results from different tools. The method utilizes existing tool results to build a classification model that can classify new annotations as correct or incorrect. The results show that the reuse and the machine learning-based combination improve the annotation quality compared to existing approaches focussing on the biomedical domain. A further part of data integration is entity resolution to build unified knowledge bases from different data sources. A data source consists of a set of records characterized by attributes. The goal of entity resolution is to identify records representing the same real-world entity. Many methods focus on linking data sources consisting of records being characterized by attributes. Nevertheless, only a few methods can handle graph-structured knowledge bases or consider temporal aspects. The temporal aspects are essential to identify the same entities over different time intervals since these aspects underlie certain conditions. Moreover, records can be related to other records so that a small graph structure exists for each record. These small graphs can be linked to each other if they represent the same. This thesis proposes an entity resolution approach for census data consisting of person records for different time intervals. The approach also considers the graph structure of persons given by family relationships. For achieving qualitative results, current methods apply machine-learning techniques to classify record pairs as the same entity. The classification task used a model that is generated by training data. In this case, the training data is a set of record pairs that are labeled as a duplicate or not. Nevertheless, the generation of training data is a time-consuming task so that active learning techniques are relevant for reducing the number of training examples. The entity resolution method for temporal graph-structured data shows an improvement compared to previous collective entity resolution approaches. The developed active learning approach achieves comparable results to supervised learning methods and outperforms other limited budget active learning methods. Besides the entity resolution approach, the thesis introduces the concept of evolution operators for communities. These operators can express the dynamics of communities and individuals. For instance, we can formulate that two communities merged or split over time. Moreover, the operators allow observing the history of individuals. Overall, the presented annotation approaches generate qualitative annotations for medical forms. The annotations enable comprehensive analysis across different data sources as well as accurate queries. The proposed entity resolution approaches improve existing ones so that they contribute to the generation of qualitative knowledge graphs and data analysis tasks

    Semantic annotation of clinical questionnaires to support personalized medicine

    Get PDF
    Tese de Mestrado, Bioinformática e Biologia Computacional, 2022, Universidade de Lisboa, Faculdade de CiênciasAtualmente estamos numa era global de constante evolução tecnológica, e uma das áreas que têm beneficiado com isso é a medicina, uma vez que com integração da vertente tecnológica na medicina, tem vindo a ter um papel cada vez mais importante quer do ponto de vista dos médicos quer do ponto de vista dos pacientes. Como resultado de melhores ferramentas que permitam melhorar o exercício das funções dos médicos, estão se a criar condições para que os pacientes possam ter um melhor acompanhamento, entendimento e atualização em tempo real da sua condição clínica. O setor dos Cuidados de Saúde é responsável pelas novidades que surgem quase diariamente e que permitem melhorar a experiência do paciente e o modo como os médicos podem tirar proveito da informação que os dados contêm em prol de uma validação mais célere e eficaz. Este setor tem gerado um volume cada vez mais maciço de dados, entre os quais relatórios médicos, registos de sensores inerciais, gravações de consultas, imagens, vídeos e avaliações médicas nas quais se inserem os questionários e as escalas clínicas que prometem aos pacientes um melhor acompanhamento do seu estado de saúde, no entanto o seu enorme volume, distribuição e a grande heterogeneidade dificulta o processamento e análise. A integração deste tipo de dados é um desafio, uma vez que têm origens em diversas fontes e uma heterogeneidade semântica bastante significativa; a integração semântica de dados biomédicos resulta num desenvolvimento de uma rede semântica biomédica que relaciona conceitos entre diversas fontes o que facilita a tradução de descobertas científicas ajudando na elaboração de análises e conclusões mais complexas para isso é crucial que se atinja a interoperabilidade semântica dos dados. Este é um passo muito importante que permite a interação entre diferentes conjuntos de dados clínicos dentro do mesmo sistema de informação ou entre sistemas diferentes. Esta integração permite às ferramentas de análise e interface com os dados trabalhar sobre uma visão integrada e holística dos dados, o que em última análise permite aos clínicos um acompanhamento mais detalhado e personalizado dos seus pacientes. Esta dissertação foi desenvolvida no LASIGE e em colaboração com o Campus Neurológico Sénior e faz parte de um grande projeto que explora o fornecimento de mais e melhores dados tanto a clínicos como a pacientes. A base deste projeto assenta numa aplicação web, o DataPark que possui uma plataforma que permite ao utilizador navegar por áreas clinicas entre as quais a nutrição, fisioterapia, terapia ocupacional, terapia da fala e neuropsicologia, em que cada uma delas que alberga baterias de testes com diversos questionários e escalas clínicas de avaliação. Este tipo de avaliação clínica facilita imenso o trabalho do médico uma vez que permite que sejam implementadas à distância uma vez que o paciente pode responder remotamente, estas respostas ficam guardadas no DataPark permitindo ao médico fazer um rastreamento do status do paciente ao longo do tempo em relação a uma determinada escala. No entanto o modo como o DataPark foi desenvolvido limita uma visão do médico orientada ao questionário, ou seja o médico que acompanha o paciente quando quer ter a visão do mesmo como um todo tem esta informação espalhada e dividida por estes diferentes questionários e tem de os ir ver a todos um a um para ter a noção do status do paciente. Esta dissertação pretende fazer face a este desafio construindo um algoritmo que decomponha todas as perguntas dos diferentes questionários e permita a sua integração semântica. Isto com o objectivo de permitir ao médico ter um visão holística orientada por conceito clínico. Procedeu-se então à extração de toda a base de dados presente no DataPark, sendo esta a fonte de dados sobre a qual este trabalho se baseou, frisando que originalmente existem muitos dados em Português que terão de ser traduzidos automaticamente. Com uma análise de alto nível (numa fase inicial) sobre os questionários da base de dados, iniciou-se a construção de um modelo semântico que pudesse descrever os dados presentes nos questionários e escalas. Assim de uma forma manual foi feito um levantamento de todos os conceitos clínicos que se conseguiu identificar num sub conjunto de questionários, mais concretamente 15 com os 5 mais respondidos em relação à Doença de parkinson, os 5 mais respondidos em relação à doença de AVC e os 5 mais respondidos que não estejam associados a uma única patologia em específico. Este modelo foi melhorado e evoluiu em conjunto com uma equipa de 12 médicos e terapeutas do CNS ao longo de 7 reuniões durante as quais foi levado a cabo um workshop de validação que permitiu dotar o modelo construído de uma fiabilidade elevada. Em paralelo procedeu-se à elaboração de 2 estudo: (i) um estudo que consistia em avaliar com qual ou quais ontologias se obtém a maior cobertura dos dados do sub conjunto de 15 questionários. A conclusão a que se chegou foi que o conjunto de ontologias que nos conferia mais segurança é constituído pelas ontologias LOINC, NCIT, SNOMED e OCHV, conjunto esse foi utilizado daqui em diante; (ii) outro estudo procurou aferir qual a ferramenta de tradução automática(Google Translator ou Microsoft Translator) que confere uma segurança maior, para isso procedeu-se à tradução completa de 3 questionários que apesar de estar na base de dados no idioma português, tem a sua versão original em inglês. Isto permitiu-nos traduzir estes 3 questionários de português para inglês e avaliar em qual das duas ferramentas se obteve uma melhor performance. O Microsoft Translator apresentou com uma diferença pequena um desempenho superior, sendo portanto a ferramenta de tradução automática escolhida para integrar o nosso algoritmo. Concluídos estes 2 estudos temos assim o conjunto de dados uniformizado numa só linguagem, e o conjunto de ontologias escolhidas para a anotação semântica. Para entender esta fase do trabalho há que entender que ontologias são poderosas ferramentas computacionais que consistem num conjunto de conceitos ou termos, que nomeiam e definem as entidades presentes num certo domínio de interesse, no ramo da biomedicina são designadas por ontologias biomédicas. O uso de ontologias biomédicas confere uma grande utilidade na partilha, recuperação e na extração de informação na biomedicina tendo um papel crucial para a interoperabilidade semântica que é exatamente o nosso objectivo final. Assim sendo procedeu-se à anotação semântica das questões do sub-conjunto de 15 questionários, uma anotação semântica é um processo que associa formalmente o alvo textual a um conceito/termo, podendo estabelecer desta forma pontes entre documentos/texto-alvos diferentes que abordam o mesmo conceito. Ou seja, uma anotação semântica é associar um termo de uma determinada ontologia a um conceito presente no texto alvo. Imaginando que o texto alvo são diferentes perguntas de vários questionários, é natural encontrar diferentes questões de diferentes áreas de diagnóstico que estejam conectados por termos ontológicos em comum. Depois da anotação completada é feita a integração do modelo semântico, com o algoritmo desenvolvido com o conjunto de ontologias e ainda com os dados dos pacientes. Desta forma sabemos que um determinado paciente respondeu a várias perguntas que abordam um mesmo conceito, essas perguntas estão interligadas semanticamente uma vez que têm o mesmo conceito mapeado. A nível de performance geral tanto os processos tradução como de anotação tiveram um desempenho aceitável, onde a nivel de tradução se atingiu 78% accuracy, 76% recall e uma F-mesure de 0.77 e ao nível da performance de anotação obteve-se 87% de anotações bem conseguidas. Portanto num cômputo geral consegue-se atingir o principal objectivo que era a obtenção holística integrada com o modelo semântico e os dados do DataPark(Questionários e pacientes).Healthcare is a multi-domain area, with professionals from different areas often collaborating to provide patients with the best possible care. Neurological and neurodegenerative diseases are especially so, with multiple areas, including neurology, psychology, nursing, physical therapy, speech therapy and others coming together to support these patients. The DataPark application allows healthcare providers to store, manage and analyse information about patients with neurological disorders from different perspectives including evaluation scales and questionnaires. However, the application does not provide a holistic view of the patient status because it is split across different domains and clinical scales. This work proposes a methodology for the semantic integration of this data. It developed the data scaffolding to afford a holistic view of the patient status that is concept-oriented rather than scale or test battery oriented. A semantic model was developed in collaboration with healthcare providers from different areas, which was subsequently aligned with existing biomedical ontologies. The questionnaire and scale data was semantically annotated to this semantic model, with a translation step when the original data was in Portuguese. The process was applied to a subset of 15 scales with a manual evaluation of each process. The semantic model includes 204 concepts and 436 links to external ontologies. Translation achieved an accuracy of 78%, whereas the semantic annotation achieved 87%. The final integrated dataset covers 443 patients. Finally, applying the process of semantic annotation to the whole dataset, conditions are created for the process of semantic integration to occur, this process consists in crossing all questions from different questionnaires and establishing a connection between those that contain the same annotation. This work allows healthcare providers to assess patients in a more global fashion, integrating data collected from different scales and test batteries that evaluate the same or similar parameters

    Analysis of the suitability of existing medical ontologies for building a scalable semantic interoperability solution supporting multi-site collaboration in oncology

    Get PDF
    Semantic interoperability is essential to facilitate efficient collaboration in heterogeneous multi-site healthcare environments. The deployment of a semantic interoperability solution has the potential to enable a wide range of informatics supported applications in clinical care and research both within as ingle healthcare organization and in a network of organizations. At the same time, building and deploying a semantic interoperability solution may require significant effort to carryout data transformation and to harmonize the semantics of the information in the different systems. Our approach to semantic interoperability leverages existing healthcare standards and ontologies, focusing first on specific clinical domains and key applications, and gradually expanding the solution when needed. An important objective of this work is to create a semantic link between clinical research and care environments to enable applications such as streamlining the execution of multi-centric clinical trials, including the identification of eligible patients for the trials. This paper presents an analysis of the suitability of several widely-used medical ontologies in the clinical domain: SNOMED-CT, LOINC, MedDRA, to capture the semantics of the clinical trial eligibility criteria, of the clinical trial data (e.g., Clinical Report Forms), and of the corresponding patient record data that would enable the automatic identification of eligible patients. Next to the coverage provided by the ontologies we evaluate and compare the sizes of the sets of relevant concepts and their relative frequency to estimate the cost of data transformation, of building the necessary semantic mappings, and of extending the solution to new domains. This analysis shows that our approach is both feasible and scalable

    A Knowledge-based Integrative Modeling Approach for <em>In-Silico</em> Identification of Mechanistic Targets in Neurodegeneration with Focus on Alzheimer’s Disease

    Get PDF
    Dementia is the progressive decline in cognitive function due to damage or disease in the body beyond what might be expected from normal aging. Based on neuropathological and clinical criteria, dementia includes a spectrum of diseases, namely Alzheimer's dementia, Parkinson's dementia, Lewy Body disease, Alzheimer's dementia with Parkinson's, Pick's disease, Semantic dementia, and large and small vessel disease. It is thought that these disorders result from a combination of genetic and environmental risk factors. Despite accumulating knowledge that has been gained about pathophysiological and clinical characteristics of the disease, no coherent and integrative picture of molecular mechanisms underlying neurodegeneration in Alzheimer’s disease is available. Existing drugs only offer symptomatic relief to the patients and lack any efficient disease-modifying effects. The present research proposes a knowledge-based rationale towards integrative modeling of disease mechanism for identifying potential candidate targets and biomarkers in Alzheimer’s disease. Integrative disease modeling is an emerging knowledge-based paradigm in translational research that exploits the power of computational methods to collect, store, integrate, model and interpret accumulated disease information across different biological scales from molecules to phenotypes. It prepares the ground for transitioning from ‘descriptive’ to “mechanistic” representation of disease processes. The proposed approach was used to introduce an integrative framework, which integrates, on one hand, extracted knowledge from the literature using semantically supported text-mining technologies and, on the other hand, primary experimental data such as gene/protein expression or imaging readouts. The aim of such a hybrid integrative modeling approach was not only to provide a consolidated systems view on the disease mechanism as a whole but also to increase specificity and sensitivity of the mechanistic model by providing disease-specific context. This approach was successfully used for correlating clinical manifestations of the disease to their corresponding molecular events and led to the identification and modeling of three important mechanistic components underlying Alzheimer’s dementia, namely the CNS, the immune system and the endocrine components. These models were validated using a novel in-silico validation method, namely biomarker-guided pathway analysis and a pathway-based target identification approach was introduced, which resulted in the identification of the MAPK signaling pathway as a potential candidate target at the crossroad of the triad components underlying disease mechanism in Alzheimer’s dementia
    corecore