3 research outputs found

    IEEE J Biomed Health Inform

    Get PDF
    Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance. Here, we present a strategy for incorporating short sequences of text (i.e. keywords) into training to boost model accuracy on rare classes. In our approach, we assemble a set of keywords, including short phrases, associated with each class. The keywords are then used as additional data during each batch of model training, resulting in a training loss that has contributions from both raw data and keywords. We evaluate our approach on classification of cancer pathology reports, which shows a substantial increase in model performance for rare classes. Furthermore, we analyze the impact of keywords on model output probabilities for bigrams, providing a straightforward method to identify model difficulties for limited training data.P30 CA177558/CA/NCI NIH HHSUnited States/U58 DP003907/DP/NCCDPHP CDC HHSUnited States/2022-10-05T00:00:00Z35020599PMC953324711987vault:4335

    Using Big Data Analytics and Statistical Methods for Improving Drug Safety

    Get PDF
    This dissertation includes three studies, all focusing on utilizing Big Data and statistical methods for improving one of the most important aspects of health care, namely drug safety. In these studies we develop data analytics methodologies to inspect, clean, and model data with the aim of fulfilling the three main goals of drug safety; detection, understanding, and prediction of adverse drug effects.In the first study, we develop a methodology by combining both analytics and statistical methods with the aim of detecting associations between drugs and adverse events through historical patients' records. Particularly we show applicability of the developed methodology by focusing on investigating potential confounding role of common diabetes drugs on developing acute renal failure in diabetic patients. While traditional methods of signal detection mostly consider one drug and one adverse event at a time for investigation, our proposed methodology takes into account the effect of drug-drug interactions by identifying groups of drugs frequently prescribed together.In the second study, two independent methodologies are developed to investigate the role of prescription sequence factor on the likelihood of developing adverse events. In fact, this study focuses on using data analytics for understanding drug-event associations. Our analyses on the historical medication records of a group of diabetic patients using the proposed approaches revealed that the sequence in which the drugs are prescribed, and administered, significantly do matter in the development of adverse events associated with those drugs.The third study uses a chronological approach to develop a network of approved drugs and their known adverse events. It then utilizes a set of network metrics, both similarity- and centrality-based, to build and train machine learning predictive models and predict the likely adverse events for the newly discovered drugs before their approval and introduction to the market. For this purpose, data of known drug-event associations from a large biomedical publication database (i.e., PubMed) is employed to construct the network. The results indicate significant improvements in terms of accuracy of prediction of drug-evet associations compared with similar approaches
    corecore