3,020 research outputs found

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Use of data mining and artificial intelligence to derive public health evidence from large datasets

    Get PDF
    This thesis explores the use of data mining and AI-tailored frameworks for extracting public health evidence from large health datasets. The research presented in this thesis demonstrates the potential of these tools for automating and simplifying the data mining process, and for providing valuable insights into various public health issues.In Paper I, we used data mining and natural language processing to analyze the characteristics of genomic research on non-communicable diseases (NCDs) from the GWAS Catalog (2005 to 2022). We found that the majority of research institutions leading the work are often US-based and the majority of first, senior and all authors were male. The vast majority of complex trait GWAS has been performed in European ancestry populations, with cohorts and scientists predominantly located in medium-to-high socioeconomically ranked countries. This lack of diversity in both the data and the authorship of GWAS research has potential implications for the generalizability of genetic discoveries and the development of future interventions.In Paper II, we analyzed data collected through the app-based COVID Symptom Study in Sweden. We then created a symptom-based model to estimate the individual probability of symptomatic COVID-19 and employed this to estimate daily regional COVID-19 prevalence. We also used this data to predict next week COVID-19 hospital admissions and compared it to a model based on case notifications. We found that the symptom-based model had a lower median absolute percentage error during the first wave of the pandemic and that the model was transferable to an English dataset. The findings of this study demonstrate the feasibility of large-scale syndromic surveillance and the potential for population-based participatory surveillance initiatives in future pandemics and epidemics.In Paper III, we used data from over 500,000 participants in the COVID Symptom Study to investigate the impact of obesity and diabetes on the symptoms and duration of long-COVID. Using advanced data mining techniques, we found that individuals with higher BMI and diabetes had a higher burden of symptoms during the initial COVID-19 infection and a prolonged duration of long-COVID symptoms. We also found that vaccination had a protective effect against both COVID-19 symptoms and long-COVID symptoms in these at-risk groups. Our results demonstrate the disproportionate impact of COVID-19 on certain populations and the utility of app-based syndromic surveillance in providing timely and accurate information on the spread and impact of the virus

    Knowledge Base for MENTAL AI, in Data Science Context

    Get PDF
    Globally, 1 in 7 people has some kind of mental or substance use disorder that affects their thinking, feelings, and behaviour in everyday life. Mental well-being is vital for physical health. No Health Without Mental Health! People with mental health disorders can carry on with normal life if they get the proper treatment and support. Mental disorders are complex to diagnose due to similar and common symptoms for numerous types of mental illnesses, with a minute difference among them. In the era of big, the challenge stays to make sense of the huge amount of health research and care data. Computational methods hold significant potential to enable superior patient stratification approaches to the established clinical practice, which in turn are a pre-requirement for the development of effective personalized medicine approaches. Personalized psychiatry also plays a vital role in predicting mental disorders and improving diagnosis and optimized treatment. The use of intelligent systems is expected to grow in the medical field, and it will continue to pose abundant opportunities for solutions that can help save patients’ lives. As it does for many industries, Artificial Intelligence (AI) systems can support mental health specialists in their jobs. Machine learning algorithms can be applied to find different patterns in the most diverse sets of data. This work aims to examine and compare different machine learning classification methodologies to predict different mental disorders and, from that, extract knowledge that can help mental health professionals in their tasks. Our algorithms were trained using a total dataset of 3353 patients from different hospital units. These data are divided into three subsets of data, mainly by the characteristics that the pathologies present. We evaluate the performance of the algorithms using different metrics. Among the metrics applied, we chose the F1 score to compare and analyze the algorithms, as it is the most suitable for the data we have since they found themselves imbalances. In the first evaluation, we trained our models, using all the patient’s symptoms and diagnoses. In the second evaluation, we trained our models, using only the symptoms that were somehow related to each other and that influenced the other pathologies.Milhões de pessoas em todo o mundo são afetadas por transtornos mentais que influenciam o seu pensamento, sentimento ou comportamento. A saúde mental é um pré-requisito essencial para a saúde física e geral. Pessoas com transtornos mentais geralmente precisam de tratamento e apoio adequados para levar uma vida normal. A saúde mental é uma condição de bem-estar em que um indivíduo reconhece as suas habilidades, pode lidar com as tensões quotidianas da vida, trabalhar de forma produtiva e pode contribuir para a sua comunidade. A saúde mental afeta a vida das pessoas com transtorno mental, as suas profissões e a produtividade da comunidade. Boa saúde mental e resiliência são essenciais para a nossa saúde biológica, conexões humanas, educação, trabalho e alcançar o nosso potencial. A pandemia do covid-19 impactou significativamente a saúde mental das pessoas, em particular grupos como saúde e outros trabalhadores da linha de frente, estudantes, pessoas que moram sozinhas e pessoas com condições de saúde mental pré-existentes. Além disso, os serviços para transtornos mentais, neurológicos e por uso de substâncias foram significativamente interrompidos. Os transtornos mentais são classificados como de diagnóstico complexo devido à semelhança dos sintomas. Consultas regulares de saúde de pessoas com transtornos mentais graves podem impedir a morte prematura. A dificuldade dos especialistas em diagnosticar é geralmente causada pela semelhança dos sintomas nos transtornos mentais, como por exemplo, transtorno de bordeline e bipolar. Os algoritmos de aprendizado de máquina podem ser aplicados para encontrar diferentes padrões nos mais diversos conjuntos de dados. Este trabalho, visa examinar e comparar diferentes metodologias de classificação de aprendizado de máquina para prever difentes transtornos mentais e disso, extrair conhecimento que possam auxiliar os profissionais da area de saude mental, nas suas tarefas. Os nossos algoritmos, foram treinados utilizando um conjunto total de dados de 3353 pacientes, provenientes de diferentes unidades hospitalares. Esses dados, estão repartidos em três subconjuntos de dados, principalmente, pelas características que as patologias apresentam. Avaliamos o desempenho dos algoritmos usando diferentes métricas. Dentre as métricas aplicadas, escolhemos o F1 score para comparar e analisar os algoritmos, pois é o mais adequado para os dados que possuímos. Visto que eles se encontravam desequilíbrios. Na primeira avaliação, treinamos os nossos modelos, utilizando todos os sintomas e diagnósticos dos pacientes. Na segunda avaliação, treinamos os nossos modelos, utilizando apenas os sintomas que apresentavam alguma relação entre si e que influenciavam nas outras patologias

    A framework to extract biomedical knowledge from gluten-related tweets: the case of dietary concerns in digital era

    Get PDF
    Journal pre proofBig data importance and potential are becoming more and more relevant nowadays, enhanced by the explosive growth of information volume that is being generated on the Internet in the last years. In this sense, many experts agree that social media networks are one of the internet areas with higher growth in recent years and one of the fields that are expected to have a more significant increment in the coming years. Similarly, social media sites are quickly becoming one of the most popular platforms to discuss health issues and exchange social support with others. In this context, this work presents a new methodology to process, classify, visualise and analyse the big data knowledge produced by the sociome on social media platforms. This work proposes a methodology that combines natural language processing techniques, ontology-based named entity recognition methods, machine learning algorithms and graph mining techniques to: (i) reduce the irrelevant messages by identifying and focusing the analysis only on individuals and patient experiences from the public discussion; (ii) reduce the lexical noise produced by the different ways in how users express themselves through the use of domain ontologies; (iii) infer the demographic data of the individuals through the combined analysis of textual, geographical and visual profile information; (iv) perform a community detection and evaluate the health topic study combining the semantic processing of the public discourse with knowledge graph representation techniques; and (v) gain information about the shared resources combining the social media statistics with the semantical analysis of the web contents. The practical relevance of the proposed methodology has been proven in the study of 1.1 million unique messages from more than 400,000 distinct users related to one of the most popular dietary fads that evolve into a multibillion-dollar industry, i.e., gluten-free food. Besides, this work analysed one of the least research fields studied on Twitter concerning public health (i.e., the allergies or immunology diseases as celiac disease), discovering a wide range of health-related conclusions.SING group thanks CITI (Centro de Investigacion, Transferencia e Innovacion) from the University of Vigo for hosting its IT infrastructure. This work was supported by: the Associate Laboratory for Green Chemistry-LAQV, which is financed by national funds from and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of [UIDB/50006/2020] and [UIDB/04469/2020] units, and BioTecNorte operation [NORTE010145FEDER000004] funded by the European Regional Development Fund under the scope of Norte2020Programa Operacional Regional do Norte, the Xunta de Galicia (Centro singular de investigacion de Galicia accreditation 2019-2022) and the European Union (European Regional Development Fund - ERDF)- Ref. [ED431G2019/06] , and Conselleria de Educacion, Universidades e Formacion Profesional (Xunta de Galicia) under the scope of the strategic funding of [ED431C2018/55GRC] Competitive Reference Group. The authors also acknowledge the post-doctoral fellowship [ED481B2019032] of Martin PerezPerez, funded by the Xunta de Galicia. Funding for open access charge: Universidade de Vigo/CISUGinfo:eu-repo/semantics/publishedVersio

    A Systematic Approach to Big Data Analysis in Cataract Patients In Telangana State, India

    Get PDF
    Big data is the new gold, especially in healthcare. Advances in collecting and processing Electronic Medical Records (EMRs), coupled with increasing computer capabilities have resulted in an increased interest in the use of big data in healthcare. Big data require collection and analysis of data at an unprecedented scale and represents a paradigm shift in healthcare, offering on one hand the capacity to generate new knowledge more quickly than traditional scientific approaches, and, on the other hand, a holistic understanding of specific illnesses when socio-demographics are incorporated in the analysis. Big data promises more personalized and precision medicine for patients with improved accuracy and earlier diagnosis, and therapy geared to an individual’s unique combination of genes, environmental risk, and precise disease phenotype. Ophthalmology has been an area of focus where results have shown to be promising. The objective of this study was to determine whether the EMR record in LV Prasad Eye Institute (LVPEI), based in Hyderabad, India, can contribute to the management of patient care, through studying how climatic and socio-demographic factors relate to cataracts, clouding of the lens – turning the lens from clear to yellow, brown or even milky white, which cause visual impairment and blindness if left untreated. The study was designed by merging a dataset obtained from the Telangana State Development Society to an existing EMR of approximately 1 million patients, who presented themselves with different eye symptoms and were diagnosed with several ocular diseases from the years (2011-2019), a timeframe of 8 years. The dataset obtained included climatic variables to be tested alongside the development of cataracts in patients. Microsoft Power BI was used to analyze the data through prescriptive and descriptive data analysis techniques to read patterns that can dig deeper into high-risk climatic and socio-demographic factors that correlate to the development of cataract. Our findings revealed that there is a high presence of cataract in the state of Telangana, mostly in rural areas and throughout the different weather seasons in India. Women tend to be the most affected as per the number of visits to the clinic, while home makers make the most visit to the hospital, in addition to employees, students, and laborers. While cataract is most dominant in the older age population, diseases such as astigmatism and conjunctivitis, are more present in the younger age population. The study appeared useful for taking preventive measures in the future to manage the treatment of patients who present themselves with cataracts in Telangana. In addition, this research created a pathway for new methods in the study of how EMRs contribute to new knowledge in ophthalmology. Results indicated that cultural upbringing, climatic factors, and proximity to the state-run thermal plant play a significant role in the presence of cataracts. Through testing the methodology used, observations indicate that the AI technique used is only effective when variables are minimized. Reflections suggest that studying patients through a more holistic and systematic approach can reveal new insights that can help bridge the gap between existing knowledge and practice for an aim to provide enhanced ophthalmic care in India

    A survey on artificial intelligence based techniques for diagnosis of hepatitis variants

    Get PDF
    Hepatitis is a dreaded disease that has taken the lives of so many people over the recent past years. The research survey shows that hepatitis viral disease has five major variants referred to as Hepatitis A, B, C, D, and E. Scholars over the years have tried to find an alternative diagnostic means for hepatitis disease using artificial intelligence (AI) techniques in order to save lives. This study extensively reviewed 37 papers on AI based techniques for diagnosing core hepatitis viral disease. Results showed that Hepatitis B (30%) and C (3%) were the only types of hepatitis the AI-based techniques were used to diagnose and properly classified out of the five major types, while (67%) of the paper reviewed diagnosed hepatitis disease based on the different AI based approach but were not classified into any of the five major types. Results from the study also revealed that 18 out of the 37 papers reviewed used hybrid approach, while the remaining 19 used single AI based approach. This shows no significance in terms of technique usage in modeling intelligence into application. This study reveals furthermore a serious gap in knowledge in terms of single hepatitis type prediction or diagnosis in all the papers considered, and recommends that the future road map should be in the aspect of integrating the major hepatitis variants into a single predictive model using effective intelligent machine learning techniques in order to reduce cost of diagnosis and quick treatment of patients
    • …
    corecore