28 research outputs found

    Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation

    Get PDF
    Background: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline. Methods: As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org, to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the “Subject:Relationship:Object” syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources. Results: A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%. Conclusion: The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain

    Avaliação dos pacientes pediátricos com câncer : reconhecendo as síndromes de predisposição hereditária

    Get PDF
    Base teórica: Atualmente, estima-se que cerca de 5% a 10% de todos os tumores estejam associados à predisposição hereditária e diretamente causados por variantes patogênicas germinativas em genes de predisposição ao câncer de moderada e alta penetrância. O câncer infantil tem um componente genético forte e anteriormente subestimado e muitas Síndromes de Predisposição Hereditária ao Câncer (SPHC) pediátricas já foram identificadas e caracterizadas. Identificar a predisposição ao câncer infantil é relevante para o paciente e sua família. Para alguns pacientes, essa informação pode orientar o uso de estratégias de tratamento modificadas se houver toxicidade ou doença resistente, e também pode resultar na adoção de medidas de vigilância para detecção precoce de neoplasias. Objetivo: Caracterizar uma série de pacientes pediátricos com câncer quanto às suas características no diagnóstico, baseado nos “critérios de Jongmans” modificados, definindo assim a prevalência de casos que deveriam ser encaminhados para avaliação genética. Métodos: Este é um estudo observacional descritivo retrospectivo, composto pela análise de todos pacientes oncológicos pediátricos que internaram nos anos de 2017 e 2018 no Serviço de Oncologia Pediátrica do Hospital de Clínicas de Porto Alegre. Os dados foram coletados através de consulta ao prontuário eletrônico, buscando identificar os “critérios de Jongmans” (questionário modificado). Resultados: Dos 149 pacientes incluídos no estudo, 86 (57,75%) eram do sexo masculino e a mediana de idade foi de 6 anos (1,5-11,5 anos). Destes, 77 (51,7%) preenchiam ao menos um "critério de Jongmans" modificados, tendo, portanto, indicação para encaminhamento para avaliação genética. Ressalta-se que em 148 casos (99,3%), não haviam dados completos no prontuário para avaliar todos os seis critérios. Dos 77 pacientes com pelo menos um dos critérios, apenas 36 (46,7%) foram encaminhados ao ambulatório de genética ou avaliados em consultoria genética no prazo de até 90 dias após o diagnóstico. Discussão: A distribuição dos pacientes na amostra, por sexo e idade, é similar ao observado em publicações sobre câncer pediátrico. A avaliação de cada critério demonstra uma falha importante na descrição detalhada das informações constantes no prontuário eletrônico, com muitas informações relevantes faltantes. Uma parcela importante dos pacientes apresentava pelo menos um critério para encaminhamento à avaliação genética. No entanto, a proporção de pacientes que preenchiam critérios e que não foram encaminhados em 90 dias (58%; 45/77) sugere que ainda há desconhecimento, entre os profissionais de saúde, sobre as indicações de avaliação oncogenética nas SPHC na população pediátrica. Conclusão: Os resultados deste estudo, realizado em uma instituição acadêmica e terciária demonstram que 4 existem oportunidades para melhoria no registro em prontuário eletrônico de dados relevantes para avaliar câncer hereditário e no treinamento das equipes assistenciais quanto as indicações de avaliação genética em pacientes pediátricos com câncer.Background: Currently, it is estimated that about 5% to 10% of all tumors are associated with hereditary predisposition and directly associated with germline mutations in genes of high penetrance for cancer. Childhood cancer has a strong and previously underestimated genetic component and many Hereditary Cancer Predisposition Syndromes (SPHC) have already been identified and characterized. Identifying predisposition to childhood cancer is relevant to the patient and his/her family. For some patients, this may lead to modified treatment strategies in case of increased expected toxicity or resistant disease, as well as surveillance measures for the early detection of an independent malignancy. Objective: To characterize a series of pediatric patients with cancer as to their characteristics in the diagnosis, based on modified “Jongmans criteria”, thus defining the prevalence of cases that should be referred for genetic evaluation. Methods: This is a retrospective descriptive observational study, composed of the analysis of all pediatric oncology patients who were admitted in the years 2017 and 2018 to the Pediatric Oncology Service of the Hospital de Clínicas de Porto Alegre. The data were collected by consulting the electronic medical record, seeking to identify the “Jongmans criteria” (modified questionnaire). Results: Of the 149 patients included in the study, 86 (57.75%) were male and the median age was 6 years (1.5-11.5 years). Of these, 77 (51.7%) met at least one modified "Jongmans criteria", and therefore had an indication for referral for genetic evaluation. It is noteworthy that in 148 cases (99.3%), there was no complete data in the medical record to assess all six criteria. Of the patients with criteria, only 36 (46.7%) were referred to the genetics clinic or evaluated in a genetic consultancy within 90 days after diagnosis. Discussion: The demographic distribution of patients in this sample is similar to that of other publications on pediatric cancer. The evaluation of each criteria demonstrates a lack of detailed description of the information contained in the electronic medical record. An important portion of patients had at least one criterion for referral to genetic evaluation. However, the proportion of patients who met criteria and who were not referred in 90 days (58%; 45/77) suggests that there is still a lack of knowledge among health professionals about the indications for oncogenetic evaluation in HCPS in the pediatric population. Conclusion: The results found in this study demonstrate that there are problems in the data records in electronic medical records, in addition to the lack of knowledge of the assistance teams regarding referrals for genetic evaluation in pediatric cancer patients

    Repeatable and reusable research - Exploring the needs of users for a Data Portal for Disease Phenotyping

    Get PDF
    Background: Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it hard to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective: This thesis aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, for both new and existing data portals for phenotypes (concept libraries). Methods: Exploratory sequential mixed methods were used in this thesis to look at which concept libraries are available, how they are used, what their characteristics are, where there are gaps, and what needs to be done in the future from the point of view of the people who use them. This thesis consists of three phases: 1) two qualitative studies, including one-to-one interviews with researchers, clinicians, machine learning experts, and senior research managers in health data science, as well as focus group discussions with researchers working with the Secured Anonymized Information Linkage databank, 2) the creation of an email survey (i.e., the Concept Library Usability Scale), and 3) a quantitative study with researchers, health professionals, and clinicians. Results: Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would encourage them to: 1) share their work, such as receiving citations from other researchers; and 2) reuse the work of others, such as saving a lot of time and effort, which they frequently spend on creating new code lists from scratch. They also pointed out several barriers that could inhibit them from: 1) sharing their work, such as concerns about intellectual property (e.g., if they shared their methods before publication, other researchers would use them as their own); and 2) reusing others' work, such as a lack of confidence in the quality and validity of their code lists. Participants suggested some developments that they would like to see happen in order to make research that is done with routine data more reproducible, such as the availability of a drive for more transparency in research methods documentation, such as publishing complete phenotype definitions and clear code lists. Conclusions: The findings of this thesis indicated that most participants valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform such as the CALIBER research platform. Analysis of interviews, focus group discussions, and qualitative studies revealed that different users have different requirements, facilitators, barriers, and concerns about concept libraries. This work was to investigate if we should develop concept libraries in Kuwait to facilitate the development of improved data sharing. However, at the end of this thesis the recommendation is this would be unlikely to be cost effective or highly valued by users and investment in open access research publications may be of more value to the Kuwait research/academic community

    Contributions of Higher Resolution Observational Evidence from Electronic Health Records to Understand the Causal Relevance of Blood Lipids to Heart Failure and Atrial Fibrillation

    Get PDF
    Heart failure (HF) and atrial fibrillation (AF) are increasingly prevalent due to aging populations, and both diseases have a big economic and healthcare burden globally. To date, there is no primary prevention specific to healthy populations. Blood lipids (i.e., LDL-C, HDL-C, and TG), which are involved with pathophysiological mechanisms of HF and AF, might play a role in the origin of both diseases. Therefore, the potential causal relevance of blood lipids to HF and AF should be investigated. Linkage electronic health records (EHRs) provide an opportunity to investigate the association between blood lipids and the incidence of HF and AF, as these records contain large sample sizes (e.g., n>1 million) with a wide range of diseases and biomarkers routinely recorded in clinical practice. Challenges include structuring the data into a research-ready format, accurately defining outcomes, and handling missing data. The data used in this thesis is from the CALIBER platform, which links routinely collected EHRs from general practices, hospital admission, and the death registries of 3 million patients in England from 1997 to 2016. In this thesis, I (1) constructed cohorts from EHRs and ensured the validity of the cohorts and (2) examined the association between blood lipids and the incidence of HF and AF using the EHR population-based cohort design. The observed findings were then compared to the results from meta-regression of trials on lipid-lowering drugs and those from a Mendelian randomisation approach, and then I (3) assessed the predictive value of adding blood lipids in the risk prediction of incident HF and AF. Additionally, I developed the model for the prediction of 10-year risk of newly occurring HF and AF. Taken together, these findings have a valuable implementation. For future research, my findings can be a basis for developing a new drug to fight against HF and AF. For clinical application, my findings can inform clinicians whether blood lipids should be targeted and what levels are needed to protect people from HF and AF. Besides, my results can inform clinicians to monitor their patients for the developing of HF and AF

    Developing artificial intelligence and machine learning to support primary care research and practice

    Get PDF
    This thesis was motivated by the potential to use everyday data , especially that collected in electronic health records (EHRs) as part of healthcare delivery, to improve primary care for clients facing complex clinical and/or social situations. Artificial intelligence (AI) techniques can identify patterns or make predictions with these data, producing information to learn about and inform care delivery. Our first objective was to understand and critique the body of literature on AI and primary care. This was achieved through a scoping review wherein we found the field was at an early stage of maturity, primarily focused on clinical decision support for chronic conditions in high-income countries, with low levels of primary care involvement and model evaluation in real-world settings. Our second objective was to demonstrate how AI methods can be applied to problems in descriptive epidemiology. To achieve this, we collaborated with the Alliance for Healthier Communities, which provides team-based primary health care through Community Health Centres (CHCs) across Ontario to clients who experience barriers to regular care. We described sociodemographic, clinical, and healthcare use characteristics of their adult primary care population using EHR data from 2009-2019. We used both simple statistical and unsupervised learning techniques, applied with an epidemiological lens. In addition to substantive findings, we identified potential avenues for future learning initiatives, including the development of decision support tools, and methodological considerations therein. Our third objective was to advance interpretable AI methodology that is well-suited for heterogeneous data, and is applicable in clinical epidemiology as well as other settings. To achieve this, we developed a new hybrid feature- and similarity-based model for supervised learning. There are two versions, fit by convex optimization with a sparsity-inducing penalty on the kernel (similarity) portion of the model. We compared our hybrid models with solely feature- and similarity-based approaches using synthetic data and using CHC data to predict future loneliness or social isolation. We also proposed a new strategy for kernel construction with indicator-coded data. Altogether, this thesis progressed AI for primary care in general and for a particular health care organization, while making research contributions to epidemiology and to computer science

    Healthy Living: The European Congress of Epidemiology, 2015

    Get PDF
    corecore