98,188 research outputs found
An Efficient Imbalance-Aware Federated Learning Approach for Wearable Healthcare with Autoregressive Ratio Observation
Widely available healthcare services are now getting popular because of
advancements in wearable sensing techniques and mobile edge computing. People's
health information is collected by edge devices such as smartphones and
wearable bands for further analysis on servers, then send back suggestions and
alerts for abnormal conditions. The recent emergence of federated learning
allows users to train private data on local devices while updating models
collaboratively. However, the heterogeneous distribution of the health
condition data may lead to significant risks to model performance due to class
imbalance. Meanwhile, as FL training is powered by sharing gradients only with
the server, training data is almost inaccessible. The conventional solutions to
class imbalance do not work for federated learning. In this work, we propose a
new federated learning framework FedImT, dedicated to addressing the challenges
of class imbalance in federated learning scenarios. FedImT contains an online
scheme that can estimate the data composition during each round of aggregation,
then introduces a self-attenuating iterative equivalent to track variations of
multiple estimations and promptly tweak the balance of the loss computing for
minority classes. Experiments demonstrate the effectiveness of FedImT in
solving the imbalance problem without extra energy consumption and avoiding
privacy risks.Comment: submitted to IEEE OJCS in Oct. 2023, under revie
Semi-Supervised Learning for Mars Imagery Classification and Segmentation
With the progress of Mars exploration, numerous Mars image data are collected
and need to be analyzed. However, due to the imbalance and distortion of
Martian data, the performance of existing computer vision models is
unsatisfactory. In this paper, we introduce a semi-supervised framework for
machine vision on Mars and try to resolve two specific tasks: classification
and segmentation. Contrastive learning is a powerful representation learning
technique. However, there is too much information overlap between Martian data
samples, leading to a contradiction between contrastive learning and Martian
data. Our key idea is to reconcile this contradiction with the help of
annotations and further take advantage of unlabeled data to improve
performance. For classification, we propose to ignore inner-class pairs on
labeled data as well as neglect negative pairs on unlabeled data, forming
supervised inter-class contrastive learning and unsupervised similarity
learning. For segmentation, we extend supervised inter-class contrastive
learning into an element-wise mode and use online pseudo labels for supervision
on unlabeled areas. Experimental results show that our learning strategies can
improve the classification and segmentation models by a large margin and
outperform state-of-the-art approaches.Comment: Accepted by ACM Trans. on Multimedia Computing Communications and
Applications (TOMM
A novel approach for the effective prediction of cardiovascular disease using applied artificial intelligence techniques
Aims: The objective of this research is to develop an effective cardiovascular disease prediction framework using machine learning techniques and to achieve high accuracy for the prediction of cardiovascular disease. Methods: In this paper, we have utilized machine learning algorithms to predict cardiovascular disease on the basis of symptoms such as chest pain, age and blood pressure. This study incorporated five distinct datasets: Heart UCI, Stroke, Heart Statlog, Framingham and Coronary Heart dataset obtained from online sources. For the implementation of the framework, RapidMiner tool was used. The threeāstep approach includes preāprocessing of the dataset, applying feature selection method on preāprocessed dataset and then applying classification methods for prediction of results. We addressed missing values by replacing them with mean, and class imbalance was handled using sample bootstrapping. Various machine learning classifiers were applied out of which random forest with AdaBoost dataset using 10āfold crossāvalidation provided the high accuracy. Results: The proposed model provides the highest accuracy of 99.48% on Heart Statlog, 93.90% on Heart UCI, 96.25% on Stroke dataset, 86% on Framingham dataset and 78.36% on Coronary heart disease dataset, respectively. Conclusions: In conclusion, the results of the study have shown remarkable potential of the proposed framework. By handling imbalance and missing values, a significantly accurate framework has been established that could effectively contribute to the prediction of cardiovascular disease at early stages
A knowledge graph empowered online learning framework for access control decision-making
Knowledge graph, as an extension of graph data structure, is being used in a wide range of areas as it can store interrelated data and reveal interlinked relationships between different objects within a large system. This paper proposes an algorithm to construct an access control knowledge graph from user and resource attributes. Furthermore, an online learning framework for access control decision-making is proposed based on the constructed knowledge graph. Within the framework, we extract topological features to represent high cardinality categorical user and resource attributes. Experimental results show that topological features extracted from knowledge graph can improve the access control performance in both offline learning and online learning scenarios with different degrees of class imbalance status
GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning
Large scale machine learning and deep models are extremely data-hungry.
Unfortunately, obtaining large amounts of labeled data is expensive, and
training state-of-the-art models (with hyperparameter tuning) requires
significant computing resources and time. Secondly, real-world data is noisy
and imbalanced. As a result, several recent papers try to make the training
process more efficient and robust. However, most existing work either focuses
on robustness or efficiency, but not both. In this work, we introduce Glister,
a GeneraLIzation based data Subset selecTion for Efficient and Robust learning
framework. We formulate Glister as a mixed discrete-continuous bi-level
optimization problem to select a subset of the training data, which maximizes
the log-likelihood on a held-out validation set. Next, we propose an iterative
online algorithm Glister-Online, which performs data selection iteratively
along with the parameter updates and can be applied to any loss-based learning
algorithm. We then show that for a rich class of loss functions including
cross-entropy, hinge-loss, squared-loss, and logistic-loss, the inner discrete
data selection is an instance of (weakly) submodular optimization, and we
analyze conditions for which Glister-Online reduces the validation loss and
converges. Finally, we propose Glister-Active, an extension to batch active
learning, and we empirically demonstrate the performance of Glister on a wide
range of tasks including, (a) data selection to reduce training time, (b)
robust learning under label noise and imbalance settings, and (c) batch-active
learning with several deep and shallow models. We show that our framework
improves upon state of the art both in efficiency and accuracy (in cases (a)
and (c)) and is more efficient compared to other state-of-the-art robust
learning algorithms in case (b)
- ā¦