197,586 research outputs found

    Mining heterogeneous information graph for health status classification

    Get PDF
    In the medical domain, there exists a large volume of data from multiple sources such as electronic health records, general health examination results, and surveys. The data contain useful information reflecting people’s health and provides great opportunities for studies to improve the quality of healthcare. However, how to mine these data effectively and efficiently still remains a critical challenge. In this paper, we propose an innovative classification model for knowledge discovery from patients’ personal health repositories. By based on analytics of massive data in the National Health and Nutrition Examination Survey, the study builds a classification model to classify patients’health status and reveal the specific disease potentially suffered by the patient. This paper makes significant contributions to the advancement of knowledge in data mining with an innovative classification model specifically crafted for domain-based data. Moreover, this research contributes to the healthcare community by providing a deep understanding of people’s health with accessibility to the patterns in various observations

    Mining health knowledge graph for health risk prediction

    Get PDF
    Nowadays classification models have been widely adopted in healthcare, aiming at supporting practitioners for disease diagnosis and human error reduction. The challenge is utilising effective methods to mine real-world data in the medical domain, as many different models have been proposed with varying results. A large number of researchers focus on the diversity problem of real-time data sets in classification models. Some previous works developed methods comprising of homogeneous graphs for knowledge representation and then knowledge discovery. However, such approaches are weak in discovering different relationships among elements. In this paper, we propose an innovative classification model for knowledge discovery from patients’ personal health repositories. The model discovers medical domain knowledge from the massive data in the National Health and Nutrition Examination Survey (NHANES). The knowledge is conceptualised in a heterogeneous knowledge graph. On the basis of the model, an innovative method is developed to help uncover potential diseases suffered by people and, furthermore, to classify patients’ health risk. The proposed model is evaluated by comparison to a baseline model also built on the NHANES data set in an empirical experiment. The performance of proposed model is promising. The paper makes significant contributions to the advancement of knowledge in data mining with an innovative classification model specifically crafted for domain-based data. In addition, by accessing the patterns of various observations, the research contributes to the work of practitioners by providing a multifaceted understanding of individual and public health

    Process Mining of Disease Trajectories: A Feasibility Study

    Get PDF
    Modelling patient disease trajectories from evidence in electronic health records could help clinicians and medical researchers develop a better understanding of the progression of diseases within target populations. Process mining provides a set of well-established tools and techniques that have been used to mine electronic health record data to understand healthcare care pathways. In this paper we explore the feasibility for using a process mining methodology and toolset to automate the identification of disease trajectory models. We created synthetic electronic health record data based on a published disease trajectory model and developed a series of event log transformations to reproduce the disease trajectory model using standard process mining tools. Our approach will make it easier to produce disease trajectory models from routine health data

    Feature Selection with Integrated Gaussian Seahorse Optimization Data Mining for Cross-border Business Cooperation between the Malaysian Medical Industry and Tourism Industry

    Get PDF
    The cross-border collaboration between the medical industry and the tourism industry has gained significant attention as a promising avenue for economic growth and development. Data mining techniques are employed to extract valuable patterns and insights from large-scale datasets, shedding light on the opportunities and challenges associated with this collaborative effort. This study proposes an integrated approach that combines feature selection with Gaussian Seahorse Optimization Data Mining (GSH-DM) to identify the most relevant features and optimize the data mining process. The GSH-DM assembling comprehensive datasets encompassing information from both the Malaysian medical industry and tourism industry. The integrated GSH-DM model then applies the Gaussian Seahorse Optimization algorithm to optimize the data mining process, enhancing the accuracy and efficiency of pattern discovery. the GSH-DM model, this study aims to uncover hidden patterns, relationships, and predictive models that can guide decision-making and strategy development for cross-border business cooperation. The findings of this study contribute to a deeper understanding of the factors that influence cross-border business cooperation between the Malaysian medical industry and the tourism industry. The integrated GSH-DM approach showcases the potential of combining feature selection techniques with advanced optimization algorithms in data mining applications. The results of GSH-DM provide actionable insights for stakeholders, enabling them to make informed decisions and foster successful cross-border collaborations between the Malaysian medical industry and the tourism industry. The analysis of the results demonstrated that GSH-DM exhibits improved performance for feature selection and classification

    Implementation of an interactive pattern mining framework on electronic health record datasets

    Get PDF
    Large collections of electronic patient records contain a broad range of clinical information highly relevant for data analysis. However, they are maintained primarily for patient administration, and automated methods are required to extract valuable knowledge for predictive, preventive, personalized and participatory medicine. Sequential pattern mining is a fundamental task in data mining which can be used to find statistically relevant, non-trivial temporal dependencies of events such as disease comorbidities. This works objective is to use this mining technique to identify disease associations based on ICD-9-CM codes data of the entire Taiwanese population obtained from Taiwan’s National Health Insurance Research Database. This thesis reports the development and implementation of the Disease Pattern Miner – a pattern mining framework in a medical domain. The framework was designed as a Web application which can be used to run several state-of-the-art sequence mining algorithms on electronic health records, collect and filter the results to reduce the number of patterns to a meaningful size, and visualize the disease associations as an interactive model in a specific population group. This may be crucial to discover new disease associations and offer novel insights to explain disease pathogenesis. A structured evaluation of the data and models are required before medical data-scientist may use this application as a tool for further research to get a better understanding of disease comorbidities

    Comparison of Data Mining Classification Algorithms for Stroke Disease Prediction Using the SMOTE Upsampling Method

    Get PDF
    Stroke is a circulation disorder in the brain that can cause symptoms and signs related to the affected part of the brain and is the leading cause of death and disability in Indonesia. Everyone is at risk of experiencing a stroke, and it is important to recognize and manage risk factors. Data Mining techniques can help in the extraction and prediction of information, as well as finding hidden patterns in stroke medical data. The dataset used in this research comes from Kaggle and is imbalanced, so the SMOTE Upsampling technique is used to address this imbalance issue. The results of the study conclude that the use of SMOTE technique in the C4.5, NB, and KNN algorithms can increase precision, recall, and AUC. The C4.5 algorithm and SMOTE technique as the best performing algorithm were selected for testing new data, and the results show that the model created can predict stroke risk more accurately than the C4.5 model without SMOTE. However, it should be noted that based on the author's interview with one of the medical practitioners, the model cannot be directly used in medical practice because the observations in the medical field to determine factors related to stroke are highly complex. Thus, a new understanding revealed that predicting stroke in a practical setting is highly complex. While data mining can be used as a predictive tool in the initial stage for predictions in the general population, it is strongly recommended to undergo direct examination by doctors in a hospital to obtain more accurate and comprehensive medical evaluations

    PathologyBERT -- Pre-trained Vs. A New Transformer Language Model for Pathology Domain

    Full text link
    Pathology text mining is a challenging task given the reporting variability and constant new findings in cancer sub-type definitions. However, successful text mining of a large pathology database can play a critical role to advance 'big data' cancer research like similarity-based treatment selection, case identification, prognostication, surveillance, clinical trial screening, risk stratification, and many others. While there is a growing interest in developing language models for more specific clinical domains, no pathology-specific language space exist to support the rapid data-mining development in pathology space. In literature, a few approaches fine-tuned general transformer models on specialized corpora while maintaining the original tokenizer, but in fields requiring specialized terminology, these models often fail to perform adequately. We propose PathologyBERT - a pre-trained masked language model which was trained on 347,173 histopathology specimen reports and publicly released in the Huggingface repository. Our comprehensive experiments demonstrate that pre-training of transformer model on pathology corpora yields performance improvements on Natural Language Understanding (NLU) and Breast Cancer Diagnose Classification when compared to nonspecific language models.Comment: submitted to "American Medical Informatics Association (AMIA)" 2022 Annual Symposiu

    Machine learning paradigms for modeling spatial and temporal information in multimedia data mining

    Get PDF
    Multimedia data mining and knowledge discovery is a fast emerging interdisciplinary applied research area. There is tremendous potential for effective use of multimedia data mining (MDM) through intelligent analysis. Diverse application areas are increasingly relying on multimedia under-standing systems. Advances in multimedia understanding are related directly to advances in signal processing, computer vision, machine learning, pattern recognition, multimedia databases, and smart sensors. The main mission of this special issue is to identify state-of-the-art machine learning paradigms that are particularly powerful and effective for modeling and combining temporal and spatial media cues such as audio, visual, and face information and for accomplishing tasks of multimedia data mining and knowledge discovery. These models should be able to bridge the gap between low-level audiovisual features which require signal processing and high-level semantics. A number of papers have been submitted to the special issue in the areas of imaging, artificial intelligence; and pattern recognition and five contributions have been selected covering state-of-the-art algorithms and advanced related topics. The first contribution by D. Xiang et al. “Evaluation of data quality and drought monitoring capability of FY-3A MERSI data” describes some basic parameters and major technical indicators of the FY-3A, and evaluates data quality and drought monitoring capability of the Medium-Resolution Imager (MERSI) onboard the FY-3A. The second contribution by A. Belatreche et al. “Computing with biologically inspired neural oscillators: application to color image segmentation” investigates the computing capabilities and potential applications of neural oscillators, a biologically inspired neural model, to gray scale and color image segmentation, an important task in image understanding and object recognition. The major contribution of this paper is the ability to use neural oscillators as a learning scheme for solving real world engineering problems. The third paper by A. Dargazany et al. entitled “Multibandwidth Kernel-based object tracking” explores new methods for object tracking using the mean shift (MS). A bandwidth-handling MS technique is deployed in which the tracker reach the global mode of the density function not requiring a specific staring point. It has been proven via experiments that the Gradual Multibandwidth Mean Shift tracking algorithm can converge faster than the conventional kernel-based object tracking (known as the mean shift). The fourth contribution by S. Alzu’bi et al. entitled “3D medical volume segmentation using hybrid multi-resolution statistical approaches” studies new 3D volume segmentation using multiresolution statistical approaches based on discrete wavelet transform and hidden Markov models. This system commonly reduced the percentage error achieved using the traditional 2D segmentation techniques by several percent. Furthermore, a contribution by G. Cabanes et al. entitled “Unsupervised topographic learning for spatiotemporal data mining” proposes a new unsupervised algorithm, suitable for the analysis of noisy spatiotemporal Radio Frequency Identification (RFID) data. The new unsupervised algorithm depicted in this article is an efficient data mining tool for behavioral studies based on RFID technology. It has the ability to discover and compare stable patterns in a RFID signal, and is appropriate for continuous learning. Finally, we would like to thank all those who helped to make this special issue possible, especially the authors and the reviewers of the articles. Our thanks go to the Hindawi staff and personnel, the journal Manager in bringing about the issue and giving us the opportunity to edit this special issue

    Context-Specific Target Definition in Influenza A Virus Hemagglutinin-Glycan Receptor Interactions

    Get PDF
    Protein-glycan interactions are important regulators of a variety of biological processes, ranging from immune recognition to anticoagulation. An important area of active research is directed toward understanding the role of host cell surface glycans as recognition sites for pathogen protein receptors. Recognition of cell surface glycans is a widely employed strategy for a variety of pathogens, including bacteria, parasites, and viruses. We present here a representative example of such an interaction: the binding of influenza A hemagglutinin (HA) to specific sialylated glycans on the cell surface of human upper airway epithelial cells, which initiates the infection cycle. We detail a generalizable strategy to understand the nature of protein-glycan interactions both structurally and biochemically, using HA as a model system. This strategy combines a top-down approach using available structural information to define important contacts between glycans and HA, with a bottom-up approach using data-mining and informatics approaches to identify the common motifs that distinguish glycan binders from nonbinders. By probing protein-glycan interactions simultaneously through top-down and bottom-up approaches, we can scientifically validate a series of observations. This in turn provides additional confidence and surmounts known challenges in the study of protein-glycan interactions, such as accounting for multivalency, and thus truly defines concepts such as specificity, affinity, and avidity. With the advent of new technologies for glycomics—including glycan arrays, data-mining solutions, and robust algorithms to model protein-glycan interactions—we anticipate that such combination approaches will become tractable for a wide variety of protein-glycan interactions.National Institute of General Medical Sciences (U.S.) (GM 57073)National Institute of General Medical Sciences (U.S.) (U54 GM62116)Singapore-MIT Alliance for Research and Technolog
    • …
    corecore