54 research outputs found

    Predictive Learning from Real-World Medical Data: Overcoming Quality Challenges

    Get PDF
    Randomized controlled trials (RCTs) are pivotal in medical research, notably as the gold standard, but face challenges, especially with specific groups like pregnant women and newborns. Real-world data (RWD), from sources like electronic medical records and insurance claims, complements RCTs in areas like disease risk prediction and diagnosis. However, RWD's retrospective nature leads to issues such as missing values and data imbalance, requiring intensive data preprocessing. To enhance RWD's quality for predictive modeling, this thesis introduces a suite of algorithms developed to automatically resolve RWD's low-quality issues for predictive modeling. In this study, the AMI-Net method is first introduced, innovatively treating samples as bags with various feature-value pairs and unifying them in an embedding space using a multi-instance neural network. It excels in handling incomplete datasets, a frequent issue in real-world scenarios, and shows resilience to noise and class imbalances. AMI-Net's capability to discern informative instances minimizes the effects of low-quality data. The enhanced version, AMI-Net+, improves instance selection, boosting performance and generalization. However, AMI-Net series initially only processes binary input features, a constraint overcome by AMI-Net3, which supports binary, nominal, ordinal, and continuous features. Despite advancements, challenges like missing values, data inconsistencies, and labeling errors persist in real-world data. The AMI-Net series also shows promise for regression and multi-task learning, potentially mitigating low-quality data issues. Tested on various hospital datasets, these methods prove effective, though risks of overfitting and bias remain, necessitating further research. Overall, while promising for clinical studies and other applications, ensuring data quality and reliability is crucial for these methods' success

    MISSING VALUE IMPUTATION AND NORMALIZATION TECHNIQUES IN MYOCARDIAL INFARCTION

    Get PDF
    Missing Data imputation is an important research topic in data mining. In general, real data contains missing values. The presence of the missing value in the data set has a major problem for precise prediction. The objective of this paper is to highlight possible improvement of existing algorithm for medical data. KNBP imputation method based on KNN and BPCA is proposed and evaluate MSE and RMSE estimates. Normalization is done by comparing three algorithms namely min-max normalization, Z-score and decimal scaling. The experiment is done with standard bench mark data and real time collected data. KNBP imputation method and Decimal Scaling Algorithm for Normalization got lower error rate

    Integrating heterogeneous data into electronic medical record analysis

    Get PDF
    Electronic medical records (EMRs) are the digital equivalent of paper records at a clinician's office. They contain patient information such as treatment and medical history, and have been shown to have a wide variety of benefits. However, EMRs typically contain a multitude of diverse data, including images, doctor notes, medical test results, and genomic data. This heterogeneity generates high dimensionality and data sparsity, which are two of the most prevalent culprits that exacerbate already difficult computational problems. Additionally, domain-specific characteristics, such as the existence of synonyms in the medical vocabulary, introduce ambiguity. This can further reduce the data mining potential of EMRs. This thesis is a systematic study that addresses these issues associated with EMRs. In particular, I utilized heterogeneous data sources that are typically incompatible, and then developed frameworks in which these data sources complement one another. As a result, these methods have the potential for direct clinical translation, paving the way for improving healthcare from a data-driven perspective. To improve a variety of downstream healthcare applications, such as patient subcategorization, survival analysis, and visualization, I used external networks of domain knowledge consisting of drug-symptom relationships, protein-protein interactions, and genetic information to enhance patient records. I found that this enhancement process increased the data mining capabilities as well as the interpretability of the EMRs. To improve EMR retrieval systems, I developed a query expansion method that frames symptoms and treatments as two different languages. I found that a topic modeling method that follows this dual-language framework yielded the highest performance. Lastly, I showed that due to pathological similarities, jointly studying Alzheimer's disease and Parkinson's disease resulted in higher computational power by effectively increasing the size of the training datasets. This allowed for the accurate prediction of the onset of dementia in both diseases. Each of these results can lay the groundwork for applications that have the potential to be implemented directly in clinical practice, improving the safety and quality of patient care

    Ayurveda in Knee Osteoarthritis—Secondary Analyses of a Randomized Controlled Trial

    Get PDF
    Background: Ayurveda is widely practiced in South Asia in the treatment of osteoarthritis (OA). The aim of these secondary data analyses were to identify the most relevant variables for treatment response and group differences between Ayurvedic therapy compared to conventional therapy in knee OA patients. Methods: A total of 151 patients (Ayurveda n = 77, conventional care n = 74) were analyzed according to the intention-to-treat principle in a randomized controlled trial. Different statistical approaches including generalized linear models, a radial basis function (RBF) network, exhausted CHAID, classification and regression trees (CART), and C5.0 with adaptive boosting were applied. Results: The RBF network implicated that the therapy arm and the baseline values of the WOMAC Index subscales might be the most important variables for the significant between-group differences of the WOMAC Index from baseline to 12 weeks in favor of Ayurveda. The intake of nutritional supplements in the Ayurveda group did not seem to be a significant factor in changes in the WOMAC Index. Ayurveda patients with functional limitations > 60 points and pain > 25 points at baseline showed the greatest improvements in the WOMAC Index from baseline to 12 weeks (mean value 107.8 +/- 27.4). A C5.0 model with nine predictors had a predictive accuracy of 89.4% for a change in the WOMAC Index after 12 weeks > 10. With adaptive boosting, the accuracy rose to 98%. Conclusions: These secondary analyses suggested that therapeutic effects cannot be explained by the therapies themselves alone, although they were the most important factors in the applied models

    Computational Intelligence in Healthcare

    Get PDF
    This book is a printed edition of the Special Issue Computational Intelligence in Healthcare that was published in Electronic

    Computational Intelligence in Healthcare

    Get PDF
    The number of patient health data has been estimated to have reached 2314 exabytes by 2020. Traditional data analysis techniques are unsuitable to extract useful information from such a vast quantity of data. Thus, intelligent data analysis methods combining human expertise and computational models for accurate and in-depth data analysis are necessary. The technological revolution and medical advances made by combining vast quantities of available data, cloud computing services, and AI-based solutions can provide expert insight and analysis on a mass scale and at a relatively low cost. Computational intelligence (CI) methods, such as fuzzy models, artificial neural networks, evolutionary algorithms, and probabilistic methods, have recently emerged as promising tools for the development and application of intelligent systems in healthcare practice. CI-based systems can learn from data and evolve according to changes in the environments by taking into account the uncertainty characterizing health data, including omics data, clinical data, sensor, and imaging data. The use of CI in healthcare can improve the processing of such data to develop intelligent solutions for prevention, diagnosis, treatment, and follow-up, as well as for the analysis of administrative processes. The present Special Issue on computational intelligence for healthcare is intended to show the potential and the practical impacts of CI techniques in challenging healthcare applications

    Transcriptional and immunohistological assessment of immune infiltration in pancreatic cancer.

    Get PDF
    Pancreatic adenocarcinoma is characterized by a complex tumor environment with a wide diversity of infiltrating stromal and immune cell types that impact the tumor response to conventional treatments. However, even in this poorly responsive tumor the extent of T cell infiltration as determined by quantitative immunohistology is a candidate prognostic factor for patient outcome. As such, even more comprehensive immunophenotyping of the tumor environment, such as immune cell type deconvolution via inference models based on gene expression profiling, holds significant promise. We hypothesized that RNA-Seq can provide a comprehensive alternative to quantitative immunohistology for immunophenotyping pancreatic cancer. We performed RNA-Seq on a prospective cohort of pancreatic tumor specimens and compared multiple approaches for gene expression-based immunophenotyping analysis compared to quantitative immunohistology. Our analyses demonstrated that while gene expression analyses provide additional information on the complexity of the tumor immune environment, they are limited in sensitivity by the low overall immune infiltrate in pancreatic cancer. As an alternative approach, we identified a set of genes that were enriched in highly T cell infiltrated pancreatic tumors, and demonstrate that these can identify patients with improved outcome in a reference population. These data demonstrate that the poor immune infiltrate in pancreatic cancer can present problems for analyses that use gene expression-based tools; however, there remains enormous potential in using these approaches to understand the relationships between diverse patterns of infiltrating cells and their impact on patient treatment outcomes

    Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions

    Get PDF
    Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially. IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.Peer reviewe

    Front-Line Physicians' Satisfaction with Information Systems in Hospitals

    Get PDF
    Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe
    • …
    corecore