256 research outputs found
Searching For Phenotypes Of Sepsis: An Application Of Machine Learning To Electronic Health Records
SEARCHING FOR PHENOTYPES OF SEPSIS: AN APPLICATION OF MACHINE LEARNING TO ELECTRONIC HEALTH RECORDS. Michael J. Boyle (Sponsored by R. Andrew Taylor). Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT.
Sepsis has historically been categorized into discrete subsets based on expert consensus-driven definitions, but there is evidence to suggest it would be better described as a continuum. The goal of this study was to perform an exhaustive search for distinct phenotypes of sepsis using various unsupervised machine learning techniques applied to the electronic health record (EHR) data of 41,843 Yale New Haven Health System emergency department patients with infection between 2013 and 2016. Specifically, the aims were to develop an autoencoder to reduce the high-dimensional EHR data to a latent representation amenable to clustering, and then to search for and assess the quality of clusters within that representation using various clustering methods (partitional, hierarchical, and density-based) and standard evaluation metrics. Autoencoder training was performed by minimizing the mean squared error of the reconstruction. With this exhaustive search, no convincing consistent clusters were found. Various clustering patterns were produced by the different methods but all had poor quality metrics, while evaluation metrics meant to find the ideal number of clusters did not agree on a consistent number but seemed to suggest fewer than two clusters. Inspection of one promising arrangement with eight clusters did not reveal a statistically significant difference in admission rate. While it is impossible to prove a negative, these results suggest there are not distinct phenotypic clusters of sepsis
Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations
Computer-aided analysis of biological images typically requires extensive training on large-scale annotated datasets, which is not viable in many situations. In this paper, we present Generative Adversarial Network Discriminator Learner (GAN-DL), a novel self-supervised learning paradigm based on the StyleGAN2 architecture, which we employ for self-supervised image representation learning in the case of fluorescent biological images
Explainable Machine Learning Techniques in Medical Image Analysis Based on Classification with Feature Extraction
Animals are also afflicted by COVID-19, a virus that is quickly spreading and infects both humans and animals. This fatal viral disease has an impact on people's daily lives, health, and economy of a nation. Most effective machine learning method is deep learning, which offers insightful analysis for examining a significant number of chest x-ray pictures that have a significant bearing on COVID-19 screening. This research proposes novel technique in lung image analysis for detection of lung infection due to COVID using Explainable Machine learning techniques. Here the input has been collected as COVID patient’s lung image dataset and it has been processed for noise removal and smoothening. This processed image features have been extracted using spatio transfer neural network integrated with DenseNet+ architecture. Extracted features has been classified using stacked auto Boltzmann encoder machine with VGG-19Net+. With the transfer learning method integrated into the binary classification process, the suggested algorithm achieves good classification accuracy. The experimental analysis has been carried out for various COVID dataset in terms of accuracy, precision, Recall, F-1score, RMSE, MAP. The proposed technique attained accuracy of 95%, precision of 91%, recall of 85%, F_1 score of 80%, RMSE of 61% and MAP of 51%
Towards Phytoplankton Parasite Detection Using Autoencoders
Phytoplankton parasites are largely understudied microbial components with a
potentially significant ecological impact on phytoplankton bloom dynamics. To
better understand their impact, we need improved detection methods to integrate
phytoplankton parasite interactions in monitoring aquatic ecosystems. Automated
imaging devices usually produce high amount of phytoplankton image data, while
the occurrence of anomalous phytoplankton data is rare. Thus, we propose an
unsupervised anomaly detection system based on the similarity of the original
and autoencoder-reconstructed samples. With this approach, we were able to
reach an overall F1 score of 0.75 in nine phytoplankton species, which could be
further improved by species-specific fine-tuning. The proposed unsupervised
approach was further compared with the supervised Faster R-CNN based object
detector. With this supervised approach and the model trained on plankton
species and anomalies, we were able to reach the highest F1 score of 0.86.
However, the unsupervised approach is expected to be more universal as it can
detect also unknown anomalies and it does not require any annotated anomalous
data that may not be always available in sufficient quantities. Although other
studies have dealt with plankton anomaly detection in terms of non-plankton
particles, or air bubble detection, our paper is according to our best
knowledge the first one which focuses on automated anomaly detection
considering putative phytoplankton parasites or infections
Pragmatic Evaluation of Health Monitoring & Analysis Models from an Empirical Perspective
Implementing and deploying several linked modules that can conduct real-time analysis and recommendation of patient datasets is necessary for designing health monitoring and analysis models. These databases include, but are not limited to, blood test results, computer tomography (CT) scans, MRI scans, PET scans, and other imaging tests. A combination of signal processing and image processing methods are used to process them. These methods include data collection, pre-processing, feature extraction and selection, classification, and context-specific post-processing. Researchers have put forward a variety of machine learning (ML) and deep learning (DL) techniques to carry out these tasks, which help with the high-accuracy categorization of these datasets. However, the internal operational features and the quantitative and qualitative performance indicators of each of these models differ. These models also demonstrate various functional subtleties, contextual benefits, application-specific constraints, and deployment-specific future research directions. It is difficult for researchers to pinpoint models that perform well for their application-specific use cases because of the vast range of performance. In order to reduce this uncertainty, this paper discusses a review of several Health Monitoring & Analysis Models in terms of their internal operational features & performance measurements. Readers will be able to recognise models that are appropriate for their application-specific use cases based on this discussion. When compared to other models, it was shown that Convolutional Neural Networks (CNNs), Masked Region CNN (MRCNN), Recurrent NN (RNN), Q-Learning, and Reinforcement learning models had greater analytical performance. They are hence suitable for clinical use cases. These models' worse scaling performance is a result of their increased complexity and higher implementation costs. This paper compares evaluated models in terms of accuracy, computational latency, deployment complexity, scalability, and deployment cost metrics to analyse such scenarios. This comparison will help users choose the best models for their performance-specific use cases. In this article, a new Health Monitoring Metric (HMM), which integrates many performance indicators to identify the best-performing models under various real-time patient settings, is reviewed to make the process of model selection even easier for real-time scenarios
Detecting of a Patient's Condition From Clinical Narratives Using Natural Language Representation
The rapid progress in clinical data management systems and artificial
intelligence approaches enable the era of personalized medicine. Intensive care
units (ICUs) are the ideal clinical research environment for such development
because they collect many clinical data and are highly computerized
environments. We designed a retrospective clinical study on a prospective ICU
database using clinical natural language to help in the early diagnosis of
heart failure in critically ill children. The methodology consisted of
empirical experiments of a learning algorithm to learn the hidden
interpretation and presentation of the French clinical note data. This study
included 1386 patients' clinical notes with 5444 single lines of notes. There
were 1941 positive cases (36 % of total) and 3503 negative cases classified by
two independent physicians using a standardized approach. The multilayer
perceptron neural network outperforms other discriminative and generative
classifiers. Consequently, the proposed framework yields an overall
classification performance with 89 % accuracy, 88 % recall, and 89 % precision.
Furthermore, a generative autoencoder learning algorithm was proposed to
leverage the sparsity reduction that achieved 91% accuracy, 91% recall, and 91%
precision. This study successfully applied learning representation and machine
learning algorithms to detect heart failure from clinical natural language in a
single French institution. Further work is needed to use the same methodology
in other institutions and other languages.Comment: Submitting to IEEE Transactions on Biomedical Engineering. arXiv
admin note: text overlap with arXiv:2104.0393
Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations: a COVID-19 case-study
Computer-aided analysis of biological images typically requires extensive
training on large-scale annotated datasets, which is not viable in many
situations. In this paper we present GAN-DL, a Discriminator Learner based on
the StyleGAN2 architecture, which we employ for self-supervised image
representation learning in the case of fluorescent biological images. We show
that Wasserstein Generative Adversarial Networks combined with linear Support
Vector Machines enable high-throughput compound screening based on raw images.
We demonstrate this by classifying active and inactive compounds tested for the
inhibition of SARS-CoV-2 infection in VERO and HRCE cell lines. In contrast to
previous methods, our deep learning based approach does not require any
annotation besides the one that is normally collected during the sample
preparation process. We test our technique on the RxRx19a Sars-CoV-2 image
collection. The dataset consists of fluorescent images that were generated to
assess the ability of regulatory-approved or in late-stage clinical trials
compound to modulate the in vitro infection from SARS-CoV-2 in both VERO and
HRCE cell lines. We show that our technique can be exploited not only for
classification tasks, but also to effectively derive a dose response curve for
the tested treatments, in a self-supervised manner. Lastly, we demonstrate its
generalization capabilities by successfully addressing a zero-shot learning
task, consisting in the categorization of four different cell types of the
RxRx1 fluorescent images collection
GUIDE FOR THE COLLECTION OF INSTRUSION DATA FOR MALWARE ANALYSIS AND DETECTION IN THE BUILD AND DEPLOYMENT PHASE
During the COVID-19 pandemic, when most businesses were not equipped for remote work and cloud computing, we saw a significant surge in ransomware attacks. This study aims to utilize machine learning and artificial intelligence to prevent known and unknown malware threats from being exploited by threat actors when developers build and deploy applications to the cloud. This study demonstrated an experimental quantitative research design using Aqua. The experiment\u27s sample is a Docker image. Aqua checked the Docker image for malware, sensitive data, Critical/High vulnerabilities, misconfiguration, and OSS license. The data collection approach is experimental. Our analysis of the experiment demonstrated how unapproved images were prevented from running anywhere in our environment based on known vulnerabilities, embedded secrets, OSS licensing, dynamic threat analysis, and secure image configuration. In addition to the experiment, the forensic data collected in the build and deployment phase are exploitable vulnerability, Critical/High Vulnerability Score, Misconfiguration, Sensitive Data, and Root User (Super User). Since Aqua generates a detailed audit record for every event during risk assessment and runtime, we viewed two events on the Audit page for our experiment. One of the events caused an alert due to two failed controls (Vulnerability Score, Super User), and the other was a successful event meaning that the image is secure to deploy in the production environment. The primary finding for our study is the forensic data associated with the two events on the Audit page in Aqua. In addition, Aqua validated our security controls and runtime policies based on the forensic data with both events on the Audit page. Finally, the study’s conclusions will mitigate the likelihood that organizations will fall victim to ransomware by mitigating and preventing the total damage caused by a malware attack
- …