16,293 research outputs found
Cancer Health Disparities Drivers with BERTopic Modelling and Pycaret Evaluation
The complex interplay of social, behavioural, lifestyle, environmental, health system, and natural health variables contribute to disparities in cancer treatment across racial and ethnic groups. Consequently, it is necessary to identify the variables contributing to cancer health inequalities and develop strategies to achieve health equality. Pubmed abstract on Cancer health disparities was scraped with a bio.Entrez python package. Preprocessed data with regex and Natural tool kit(NLTK), topic modelling with BERTopic embeddings, and c-TF-IDF to construct dense clusters and analyse top topics linked with Cancer health disparities. Model evaluation with Pycaret coherence score and web app deployment with Streamlit. The results showed that Topic 32 with terms obese, female, male, school, survey, student, post, and discrepancy had the best coherence score of 0.3687. In contrast, topic 8 with terms prevalence, adult, income, high, usage, diabetes, education, elderly, change and low, received the least coherence score of 0.3255. The model classifies each Subject Word score based on the scores, the granular topic concerns and trends related to cancer health disparities, investigates the connection between drivers of cancer health disparities, and evaluates the model with their coherence score values
Detecting user demographics in twitter to inform health trends in social media
The widespread and popular use of social media and social networking applications offer a promising opportunity for gaining knowledge and insights regarding population health conditions thanks to the diversity and abundance of online user-generated information (UGHI) relating to healthcare and well-being. However, users on social media and social networking sites often do not supply their complete demographic information, which greatly undermines the value of the aforementioned information for health 2.0 research, e.g., for discerning disparities across population groups in certain health conditions. To recover the missing user demographic information, existing methods observe a limited scope of user behaviors, such as word frequencies exhibited in a user’s messages, leading to sub-optimal results.
To address the above limitation and improve the performance of inferring missing user demographic information for health 2.0 research, this work proposes a new algorithmic method for extracting a social media user’s gender by exploring and exploiting a comprehensive set of a user’s behaviors on Twitter, including the user’s conversational topic choices, account profile information, and personal information. In addition, this work explores the usage of synonym expansion for detecting social media users’ ethnicities. To better capture a user’s conversational topic choices using standardized hashtags for consistent comparison, this work additionally introduces a new method that automatically generates standardized hashtags for tweets. Even though Twitter is selected as the experimental platform in this study due to its leading position among today’s social networking sites, the proposed method is in principle generically applicable to other social media sites and applications as long as there is a way to access user-generated content on those platforms.
When comparing the multi-perspective learning method with the state-of-the-art approaches for gender classification, a gender classification accuracy is observed of 88.6% for the proposed approach compared with 63.4% performance for bag-of-words and 61.4% for the peer method. Additionally, the topical approach introduced in this work outperforms vocabulary-based approach with a smaller dimensionality at 69.4% accuracy.
Furthermore, observable usage patterns of the cancer terms are analyzed across the ethnic groups inferred by the proposed algorithmic approaches. Variations among demographic groups are seen in the frequency of term usage during months known to be labeled as cancer awareness months. This work introduces methods that have the potential to serve as a very powerful and important tool in disseminating critical prevention, screening, and treatment messages to the community in real time. Study findings highlight the potential benefits of social media as a tool for detecting demographic differences in cancer-related discussions on social media
Data mining Twitter for cancer, diabetes, and asthma insights
Twitter may be a data resource to support healthcare research. Literature is still limited related to the potential of Twitter data as it relates to healthcare. The purpose of this study was to contrast the processes by which a large collection of unstructured disease-related tweets could be converted into structured data to be further analyzed. This was done with the objective of gaining insights into the content and behavioral patterns associated with disease-specific communications on Twitter. Twelve months of Twitter data related to cancer, diabetes, and asthma were collected to form a baseline dataset containing over 34 million tweets. As Twitter data in its raw form would have been difficult to manage, three separate data reduction methods were contrasted to identify a method to generate analysis files, maximizing classification precision and data retention. Each of the disease files were then run through a CHAID (chi-square automatic interaction detector) analysis to demonstrate how user behavior insights vary by disease. Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980. CHAID is a tool used to discover the relationship between variables. This study followed the standard CRISP-DM data mining approach and demonstrates how the practice of mining Twitter data fits into this six-stage iterative framework. The study produced insights that provide a new lens into the potential Twitter data has as a valuable healthcare data source as well as the nuances involved in working with the data
A Learning Health System for Radiation Oncology
The proposed research aims to address the challenges faced by clinical data science researchers in radiation oncology accessing, integrating, and analyzing heterogeneous data from various sources. The research presents a scalable intelligent infrastructure, called the Health Information Gateway and Exchange (HINGE), which captures and structures data from multiple sources into a knowledge base with semantically interlinked entities. This infrastructure enables researchers to mine novel associations and gather relevant knowledge for personalized clinical outcomes.
The dissertation discusses the design framework and implementation of HINGE, which abstracts structured data from treatment planning systems, treatment management systems, and electronic health records. It utilizes disease-specific smart templates for capturing clinical information in a discrete manner. HINGE performs data extraction, aggregation, and quality and outcome assessment functions automatically, connecting seamlessly with local IT/medical infrastructure.
Furthermore, the research presents a knowledge graph-based approach to map radiotherapy data to an ontology-based data repository using FAIR (Findable, Accessible, Interoperable, Reusable) concepts. This approach ensures that the data is easily discoverable and accessible for clinical decision support systems. The dissertation explores the ETL (Extract, Transform, Load) process, data model frameworks, ontologies, and provides a real-world clinical use case for this data mapping.
To improve the efficiency of retrieving information from large clinical datasets, a search engine based on ontology-based keyword searching and synonym-based term matching tool was developed. The hierarchical nature of ontologies is leveraged to retrieve patient records based on parent and children classes. Additionally, patient similarity analysis is conducted using vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) to identify similar patients based on text corpus creation methods. Results from the analysis using these models are presented.
The implementation of a learning health system for predicting radiation pneumonitis following stereotactic body radiotherapy is also discussed. 3D convolutional neural networks (CNNs) are utilized with radiographic and dosimetric datasets to predict the likelihood of radiation pneumonitis. DenseNet-121 and ResNet-50 models are employed for this study, along with integrated gradient techniques to identify salient regions within the input 3D image dataset. The predictive performance of the 3D CNN models is evaluated based on clinical outcomes.
Overall, the proposed Learning Health System provides a comprehensive solution for capturing, integrating, and analyzing heterogeneous data in a knowledge base. It offers researchers the ability to extract valuable insights and associations from diverse sources, ultimately leading to improved clinical outcomes. This work can serve as a model for implementing LHS in other medical specialties, advancing personalized and data-driven medicine
- …