2,068 research outputs found
A Novel Soft Computing Based Model For Symptom Analysis & Disease Classification
In countries like India, many mortality occurs every year because of improper pronouncement of disease on time. Many people remain deprived of medication as the people per doctor ratio are nearly 1:1700. Every human body and its physiological processes show some symptoms of a diseased condition. The proposed model in this paper would analyze those symptoms for identification of the disease and its type. In this proposed model, few selected attributes would be considered which are shown as symptoms by a person suspected with a particular disease. Those attributes can be taken as input for the proposed symptom analysis and classification model, which is a soft computing model for classifying a sample first to be diseased or disease free and then, if diseased, predicting its type (if any). Number of diseased and disease free samples are to be collected. Each of these samples is a collection of attributes shown / expressed by a human body. With respect to a specific disease, those collected samples form two primary clusters, one is diseased and the other one is disease free. The disease free cluster may be discarded for further analysis. Depending on the symptoms shown by the diseased samples, every disease has some types based on the symptoms it shows. The diseased cluster of samples can reform clusters among themselves depending on the types of the disease. Those clusters then become the classes of the multiclass classifier for analysis of a new incoming sample
Predicting diabetes-related hospitalizations based on electronic health records
OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip
A comprehensive study on disease risk predictions in machine learning
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. Comprehensive survey on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavours have been shifted
Bioinformatics Applications Based On Machine Learning
The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems
Identification of an Efficient Gene Expression Panel for Glioblastoma Classification.
We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu
Design and Analysis for Precision Medicine Subgroup Identification
In 2015 President Barack Obama announced the launch of the Precision Medicine Initiative, spurring an out pour of interest into research regarding patient-specific health. Precision medicine is the reproducible research from which health care professionals can provide targeted treatments to their patients. Two objectives in precision medicine include (i) identifying treatment-response subgroups and (ii) identifying disease subgroups. In this manuscript, we will consider a place for traditional study designs in the new age of precision medicine by presenting the machine learning tools and statistical theory necessary to do so. We begin with a newly proposed method for estimating the individualized treatment regime from crossover studies. This method expands generalized outcome weighted learning into the 2x2 crossover study framework by considering the difference in treatment response as the observed reward and correcting for carryover effects, estimated through regression methods. After, we propose a new technique for identifying disease subgroups by applying hierarchical clustering techniques to what can be interpreted as a set of denoised outcomes. These values are weighted averages of the observed and fitted outcomes, estimated by regressing on a set of features. Finally, we return to identifying treatment-response subgroups, but, in the realm of case-control studies. We again expand on generalized outcome weighted learning in addition to accounting for the difference in the covariate distribution between the selected study sample and the total population. Between this method and electronic health data, advancements for rare and expensive to study diseases may be closer than we think.Doctor of Philosoph
Neuropathy Classification of Corneal Nerve Images Using Artificial Intelligence
Nerve variations in the human cornea have been associated with alterations in
the neuropathy state of a patient suffering from chronic diseases. For some diseases,
such as diabetes, detection of neuropathy prior to visible symptoms is important,
whereas for others, such as multiple sclerosis, early prediction of disease worsening is
crucial. As current methods fail to provide early diagnosis of neuropathy, in vivo
corneal confocal microscopy enables very early insight into the nerve damage by
illuminating and magnifying the human cornea. This non-invasive method captures a
sequence of images from the corneal sub-basal nerve plexus. Current practices of
manual nerve tracing and classification impede the advancement of medical research in
this domain. Since corneal nerve analysis for neuropathy is in its initial stages, there is
a dire need for process automation.
To address this limitation, we seek to automate the two stages of this process:
nerve segmentation and neuropathy classification of images. For nerve segmentation,
we compare the performance of two existing solutions on multiple datasets to select the
appropriate method and proceed to the classification stage. Consequently, we approach
neuropathy classification of the images through artificial intelligence using Adaptive
Neuro-Fuzzy Inference System, Support Vector Machines, NaĆÆve Bayes and k-nearest
neighbors. We further compare the performance of machine learning classifiers with
deep learning. We ascertained that nerve segmentation using convolutional neural networks provided a significant improvement in sensitivity and false negative rate by
at least 5% over the state-of-the-art software. For classification, ANFIS yielded the best
classification accuracy of 93.7% compared to other classifiers. Furthermore, for this
problem, machine learning approaches performed better in terms of classification
accuracy than deep learning
Predictive response-relevant clustering of expression data provides insights into disease processes
This article describes and illustrates a novel method of microarray data analysis that couples model-based clustering and binary classification to form clusters of ;response-relevant' genes; that is, genes that are informative when discriminating between the different values of the response. Predictions are subsequently made using an appropriate statistical summary of each gene cluster, which we call the ;meta-covariate' representation of the cluster, in a probit regression model. We first illustrate this method by analysing a leukaemia expression dataset, before focusing closely on the meta-covariate analysis of a renal gene expression dataset in a rat model of salt-sensitive hypertension. We explore the biological insights provided by our analysis of these data. In particular, we identify a highly influential cluster of 13 genes-including three transcription factors (Arntl, Bhlhe41 and Npas2)-that is implicated as being protective against hypertension in response to increased dietary sodium. Functional and canonical pathway analysis of this cluster using Ingenuity Pathway Analysis implicated transcriptional activation and circadian rhythm signalling, respectively. Although we illustrate our method using only expression data, the method is applicable to any high-dimensional datasets
Machine learning based data pre-processing for the purpose of medical data mining and decision support
Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. Sometimes, improved data quality is itself the goal of the analysis, usually to improve processes in a production database and the designing of decision support. As medicine moves forward there is a need for sophisticated decision support systems that make use of data mining to support more orthodox knowledge engineering and Health Informatics practice. However, the real-life medical data rarely complies with the requirements of various data mining tools. It is often inconsistent, noisy, containing redundant attributes, in an unsuitable format, containing missing values and imbalanced with regards to the outcome class label.Many real-life data sets are incomplete, with missing values. In medical data mining the problem with missing values has become a challenging issue. In many clinical trials, the medical report pro-forma allow some attributes to be left blank, because they are inappropriate for some class of illness or the person providing the information feels that it is not appropriate to record the values for some attributes. The research reported in this thesis has explored the use of machine learning techniques as missing value imputation methods. The thesis also proposed a new way of imputing missing value by supervised learning. A classifier was used to learn the data patterns from a complete data sub-set and the model was later used to predict the missing values for the full dataset. The proposed machine learning based missing value imputation was applied on the thesis data and the results are compared with traditional Mean/Mode imputation. Experimental results show that all the machine learning methods which we explored outperformed the statistical method (Mean/Mode).The class imbalance problem has been found to hinder the performance of learning systems. In fact, most of the medical datasets are found to be highly imbalance in their class label. The solution to this problem is to reduce the gap between the minority class samples and the majority class samples. Over-sampling can be applied to increase the number of minority class sample to balance the data. The alternative to over-sampling is under-sampling where the size of majority class sample is reduced. The thesis proposed one cluster based under-sampling technique to reduce the gap between the majority and minority samples. Different under-sampling and over-sampling techniques were explored as ways to balance the data. The experimental results show that for the thesis data the new proposed modified cluster based under-sampling technique performed better than other class balancing techniques.In further research it is found that the class imbalance problem not only affects the classification performance but also has an adverse effect on feature selection. The thesis proposed a new framework for feature selection for class imbalanced datasets. The research found that, using the proposed framework the classifier needs less attributes to show high accuracy, and more attributes are needed if the data is highly imbalanced.The research described in the thesis contains the flowing four novel main contributions.a) Improved data mining methodology for mining medical datab) Machine learning based missing value imputation methodc) Cluster Based semi-supervised class balancing methodd) Feature selection framework for class imbalance datasetsThe performance analysis and comparative study show that the use of proposed method of missing value imputation, class balancing and feature selection framework can provide an effective approach to data preparation for building medical decision support
- ā¦