256 research outputs found

    Searching For Phenotypes Of Sepsis: An Application Of Machine Learning To Electronic Health Records

    Get PDF
    SEARCHING FOR PHENOTYPES OF SEPSIS: AN APPLICATION OF MACHINE LEARNING TO ELECTRONIC HEALTH RECORDS. Michael J. Boyle (Sponsored by R. Andrew Taylor). Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT. Sepsis has historically been categorized into discrete subsets based on expert consensus-driven definitions, but there is evidence to suggest it would be better described as a continuum. The goal of this study was to perform an exhaustive search for distinct phenotypes of sepsis using various unsupervised machine learning techniques applied to the electronic health record (EHR) data of 41,843 Yale New Haven Health System emergency department patients with infection between 2013 and 2016. Specifically, the aims were to develop an autoencoder to reduce the high-dimensional EHR data to a latent representation amenable to clustering, and then to search for and assess the quality of clusters within that representation using various clustering methods (partitional, hierarchical, and density-based) and standard evaluation metrics. Autoencoder training was performed by minimizing the mean squared error of the reconstruction. With this exhaustive search, no convincing consistent clusters were found. Various clustering patterns were produced by the different methods but all had poor quality metrics, while evaluation metrics meant to find the ideal number of clusters did not agree on a consistent number but seemed to suggest fewer than two clusters. Inspection of one promising arrangement with eight clusters did not reveal a statistically significant difference in admission rate. While it is impossible to prove a negative, these results suggest there are not distinct phenotypic clusters of sepsis

    Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations

    Get PDF
    Computer-aided analysis of biological images typically requires extensive training on large-scale annotated datasets, which is not viable in many situations. In this paper, we present Generative Adversarial Network Discriminator Learner (GAN-DL), a novel self-supervised learning paradigm based on the StyleGAN2 architecture, which we employ for self-supervised image representation learning in the case of fluorescent biological images

    Explainable Machine Learning Techniques in Medical Image Analysis Based on Classification with Feature Extraction

    Get PDF
    Animals are also afflicted by COVID-19, a virus that is quickly spreading and infects both humans and animals. This fatal viral disease has an impact on people's daily lives, health, and economy of a nation. Most effective machine learning method is deep learning, which offers insightful analysis for examining a significant number of chest x-ray pictures that have a significant bearing on COVID-19 screening. This research proposes novel technique in lung image analysis for detection of lung infection due to COVID using Explainable Machine learning techniques. Here the input has been collected as COVID patient’s lung image dataset and it has been processed for noise removal and smoothening. This processed image features have been extracted using spatio transfer neural network integrated with DenseNet+ architecture. Extracted features has been classified using stacked auto Boltzmann encoder machine with VGG-19Net+. With the transfer learning method integrated into the binary classification process, the suggested algorithm achieves good classification accuracy. The experimental analysis has been carried out for various COVID dataset in terms of accuracy, precision, Recall, F-1score, RMSE, MAP. The proposed technique attained accuracy of 95%, precision of 91%, recall of 85%, F_1 score of 80%, RMSE of 61% and MAP of 51%

    Towards Phytoplankton Parasite Detection Using Autoencoders

    Full text link
    Phytoplankton parasites are largely understudied microbial components with a potentially significant ecological impact on phytoplankton bloom dynamics. To better understand their impact, we need improved detection methods to integrate phytoplankton parasite interactions in monitoring aquatic ecosystems. Automated imaging devices usually produce high amount of phytoplankton image data, while the occurrence of anomalous phytoplankton data is rare. Thus, we propose an unsupervised anomaly detection system based on the similarity of the original and autoencoder-reconstructed samples. With this approach, we were able to reach an overall F1 score of 0.75 in nine phytoplankton species, which could be further improved by species-specific fine-tuning. The proposed unsupervised approach was further compared with the supervised Faster R-CNN based object detector. With this supervised approach and the model trained on plankton species and anomalies, we were able to reach the highest F1 score of 0.86. However, the unsupervised approach is expected to be more universal as it can detect also unknown anomalies and it does not require any annotated anomalous data that may not be always available in sufficient quantities. Although other studies have dealt with plankton anomaly detection in terms of non-plankton particles, or air bubble detection, our paper is according to our best knowledge the first one which focuses on automated anomaly detection considering putative phytoplankton parasites or infections

    Pragmatic Evaluation of Health Monitoring & Analysis Models from an Empirical Perspective

    Get PDF
    Implementing and deploying several linked modules that can conduct real-time analysis and recommendation of patient datasets is necessary for designing health monitoring and analysis models. These databases include, but are not limited to, blood test results, computer tomography (CT) scans, MRI scans, PET scans, and other imaging tests. A combination of signal processing and image processing methods are used to process them. These methods include data collection, pre-processing, feature extraction and selection, classification, and context-specific post-processing. Researchers have put forward a variety of machine learning (ML) and deep learning (DL) techniques to carry out these tasks, which help with the high-accuracy categorization of these datasets. However, the internal operational features and the quantitative and qualitative performance indicators of each of these models differ. These models also demonstrate various functional subtleties, contextual benefits, application-specific constraints, and deployment-specific future research directions. It is difficult for researchers to pinpoint models that perform well for their application-specific use cases because of the vast range of performance. In order to reduce this uncertainty, this paper discusses a review of several Health Monitoring & Analysis Models in terms of their internal operational features & performance measurements. Readers will be able to recognise models that are appropriate for their application-specific use cases based on this discussion. When compared to other models, it was shown that Convolutional Neural Networks (CNNs), Masked Region CNN (MRCNN), Recurrent NN (RNN), Q-Learning, and Reinforcement learning models had greater analytical performance. They are hence suitable for clinical use cases. These models' worse scaling performance is a result of their increased complexity and higher implementation costs. This paper compares evaluated models in terms of accuracy, computational latency, deployment complexity, scalability, and deployment cost metrics to analyse such scenarios. This comparison will help users choose the best models for their performance-specific use cases. In this article, a new Health Monitoring Metric (HMM), which integrates many performance indicators to identify the best-performing models under various real-time patient settings, is reviewed to make the process of model selection even easier for real-time scenarios

    Detecting of a Patient's Condition From Clinical Narratives Using Natural Language Representation

    Full text link
    The rapid progress in clinical data management systems and artificial intelligence approaches enable the era of personalized medicine. Intensive care units (ICUs) are the ideal clinical research environment for such development because they collect many clinical data and are highly computerized environments. We designed a retrospective clinical study on a prospective ICU database using clinical natural language to help in the early diagnosis of heart failure in critically ill children. The methodology consisted of empirical experiments of a learning algorithm to learn the hidden interpretation and presentation of the French clinical note data. This study included 1386 patients' clinical notes with 5444 single lines of notes. There were 1941 positive cases (36 % of total) and 3503 negative cases classified by two independent physicians using a standardized approach. The multilayer perceptron neural network outperforms other discriminative and generative classifiers. Consequently, the proposed framework yields an overall classification performance with 89 % accuracy, 88 % recall, and 89 % precision. Furthermore, a generative autoencoder learning algorithm was proposed to leverage the sparsity reduction that achieved 91% accuracy, 91% recall, and 91% precision. This study successfully applied learning representation and machine learning algorithms to detect heart failure from clinical natural language in a single French institution. Further work is needed to use the same methodology in other institutions and other languages.Comment: Submitting to IEEE Transactions on Biomedical Engineering. arXiv admin note: text overlap with arXiv:2104.0393

    Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations: a COVID-19 case-study

    Get PDF
    Computer-aided analysis of biological images typically requires extensive training on large-scale annotated datasets, which is not viable in many situations. In this paper we present GAN-DL, a Discriminator Learner based on the StyleGAN2 architecture, which we employ for self-supervised image representation learning in the case of fluorescent biological images. We show that Wasserstein Generative Adversarial Networks combined with linear Support Vector Machines enable high-throughput compound screening based on raw images. We demonstrate this by classifying active and inactive compounds tested for the inhibition of SARS-CoV-2 infection in VERO and HRCE cell lines. In contrast to previous methods, our deep learning based approach does not require any annotation besides the one that is normally collected during the sample preparation process. We test our technique on the RxRx19a Sars-CoV-2 image collection. The dataset consists of fluorescent images that were generated to assess the ability of regulatory-approved or in late-stage clinical trials compound to modulate the in vitro infection from SARS-CoV-2 in both VERO and HRCE cell lines. We show that our technique can be exploited not only for classification tasks, but also to effectively derive a dose response curve for the tested treatments, in a self-supervised manner. Lastly, we demonstrate its generalization capabilities by successfully addressing a zero-shot learning task, consisting in the categorization of four different cell types of the RxRx1 fluorescent images collection

    GUIDE FOR THE COLLECTION OF INSTRUSION DATA FOR MALWARE ANALYSIS AND DETECTION IN THE BUILD AND DEPLOYMENT PHASE

    Get PDF
    During the COVID-19 pandemic, when most businesses were not equipped for remote work and cloud computing, we saw a significant surge in ransomware attacks. This study aims to utilize machine learning and artificial intelligence to prevent known and unknown malware threats from being exploited by threat actors when developers build and deploy applications to the cloud. This study demonstrated an experimental quantitative research design using Aqua. The experiment\u27s sample is a Docker image. Aqua checked the Docker image for malware, sensitive data, Critical/High vulnerabilities, misconfiguration, and OSS license. The data collection approach is experimental. Our analysis of the experiment demonstrated how unapproved images were prevented from running anywhere in our environment based on known vulnerabilities, embedded secrets, OSS licensing, dynamic threat analysis, and secure image configuration. In addition to the experiment, the forensic data collected in the build and deployment phase are exploitable vulnerability, Critical/High Vulnerability Score, Misconfiguration, Sensitive Data, and Root User (Super User). Since Aqua generates a detailed audit record for every event during risk assessment and runtime, we viewed two events on the Audit page for our experiment. One of the events caused an alert due to two failed controls (Vulnerability Score, Super User), and the other was a successful event meaning that the image is secure to deploy in the production environment. The primary finding for our study is the forensic data associated with the two events on the Audit page in Aqua. In addition, Aqua validated our security controls and runtime policies based on the forensic data with both events on the Audit page. Finally, the study’s conclusions will mitigate the likelihood that organizations will fall victim to ransomware by mitigating and preventing the total damage caused by a malware attack
    • …
    corecore