6 research outputs found
Quantized Compressive K-Means
The recent framework of compressive statistical learning aims at designing
tractable learning algorithms that use only a heavily compressed
representation-or sketch-of massive datasets. Compressive K-Means (CKM) is such
a method: it estimates the centroids of data clusters from pooled, non-linear,
random signatures of the learning examples. While this approach significantly
reduces computational time on very large datasets, its digital implementation
wastes acquisition resources because the learning examples are compressed only
after the sensing stage. The present work generalizes the sketching procedure
initially defined in Compressive K-Means to a large class of periodic
nonlinearities including hardware-friendly implementations that compressively
acquire entire datasets. This idea is exemplified in a Quantized Compressive
K-Means procedure, a variant of CKM that leverages 1-bit universal quantization
(i.e. retaining the least significant bit of a standard uniform quantizer) as
the periodic sketch nonlinearity. Trading for this resource-efficient signature
(standard in most acquisition schemes) has almost no impact on the clustering
performances, as illustrated by numerical experiments
Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients
Kernel approximation is widely used to scale up kernel SVM training and
prediction. However, the memory and computation costs of kernel approximation
models are still too high if we want to deploy them on memory-limited devices
such as mobile phones, smartwatches, and IoT devices. To address this
challenge, we propose a novel memory and computation-efficient kernel SVM model
by using both binary embedding and binary model coefficients. First, we propose
an efficient way to generate compact binary embedding of the data, preserving
the kernel similarity. Second, we propose a simple but effective algorithm to
learn a linear classification model with ternary coefficients that can support
different types of loss function and regularizer. Our algorithm can achieve
better generalization accuracy than existing works on learning binary
coefficients since we allow coefficient to be , , or during the
training stage, and coefficient can be removed during model inference for
binary classification. Moreover, we provide a detailed analysis of the
convergence of our algorithm and the inference complexity of our model. The
analysis shows that the convergence to a local optimum is guaranteed, and the
inference complexity of our model is much lower than other competing methods.
Our experimental results on five large real-world datasets have demonstrated
that our proposed method can build accurate nonlinear SVM models with memory
costs less than 30KB
Mortality Prediction of Various Cancer Patients via Relevant Feature Analysis and Machine Learning
Breast, lung, prostate, and stomach cancers are the most frequent cancer types globally. Early-stage detection and diagnosis of these cancers pose a challenge in the literature. When dealing with cancer patients, physicians must select among various treatment methods that have a risk factor. Since the risks of treatment may outweigh the benefits, treatment schedule is critical in clinical decision making. Manually deciding which medications and treatments are going to be successful takes a lot of expertise and can be hard. In this paper, we offer a computational solution to predict the mortality of various types of cancer patients. The solution is based on the analysis of diagnosis, medication, and treatment parameters that can be easily acquired from electronic healthcare systems. A classification-based approach introduced to predict the mortality outcome of cancer patients. Several classifiers evaluated on the Medical Information Mart in Intensive Care IV (MIMIC-IV) dataset. Diagnosis, medication, and treatment features extracted for breast, lung, prostate, and stomach cancer patients and relevant feature selection done with Logistic Regression. Best F1 scores were 0.74 for breast, 0.73 for lung, 0.82 for prostate, and 0.79 for stomach cancer. Best AUROC scores were 0.94 for breast, 0.91 for lung, 0.96 for prostate, and 0.88 for stomach cancer. In addition, using relevant features, results were very similar to the baseline for each cancer type. Using less features and a robust machine-learning model, the proposed approach can be easily implemented in hospitals when there are limited data and resources available.publishedVersionPeer reviewe