36 research outputs found
Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers
Analysis of the fairness of machine learning (ML) algorithms recently
attracted many researchers' interest. Most ML methods show bias toward
protected groups, which limits the applicability of ML models in many
applications like crime rate prediction etc. Since the data may have missing
values which, if not appropriately handled, are known to further harmfully
affect fairness. Many imputation methods are proposed to deal with missing
data. However, the effect of missing data imputation on fairness is not studied
well. In this paper, we analyze the effect on fairness in the context of graph
data (node attributes) imputation using different embedding and neural network
methods. Extensive experiments on six datasets demonstrate severe fairness
issues in missing data imputation under graph node classification. We also find
that the choice of the imputation method affects both fairness and accuracy.
Our results provide valuable insights into graph data fairness and how to
handle missingness in graphs efficiently. This work also provides directions
regarding theoretical studies on fairness in graph data.Comment: Accepted at IEEE International Conference on Big Data (IEEE Big Data
Towards Data-centric Graph Machine Learning: Review and Outlook
Data-centric AI, with its primary focus on the collection, management, and
utilization of data to drive AI models and applications, has attracted
increasing attention in recent years. In this article, we conduct an in-depth
and comprehensive review, offering a forward-looking outlook on the current
efforts in data-centric AI pertaining to graph data-the fundamental data
structure for representing and capturing intricate dependencies among massive
and diverse real-life entities. We introduce a systematic framework,
Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of
the graph data lifecycle, including graph data collection, exploration,
improvement, exploitation, and maintenance. A thorough taxonomy of each stage
is presented to answer three critical graph-centric questions: (1) how to
enhance graph data availability and quality; (2) how to learn from graph data
with limited-availability and low-quality; (3) how to build graph MLOps systems
from the graph data-centric view. Lastly, we pinpoint the future prospects of
the DC-GML domain, providing insights to navigate its advancements and
applications.Comment: 42 pages, 9 figure
Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review
Deep learning has become a popular tool for medical image analysis, but the
limited availability of training data remains a major challenge, particularly
in the medical field where data acquisition can be costly and subject to
privacy regulations. Data augmentation techniques offer a solution by
artificially increasing the number of training samples, but these techniques
often produce limited and unconvincing results. To address this issue, a
growing number of studies have proposed the use of deep generative models to
generate more realistic and diverse data that conform to the true distribution
of the data. In this review, we focus on three types of deep generative models
for medical image augmentation: variational autoencoders, generative
adversarial networks, and diffusion models. We provide an overview of the
current state of the art in each of these models and discuss their potential
for use in different downstream tasks in medical imaging, including
classification, segmentation, and cross-modal translation. We also evaluate the
strengths and limitations of each model and suggest directions for future
research in this field. Our goal is to provide a comprehensive review about the
use of deep generative models for medical image augmentation and to highlight
the potential of these models for improving the performance of deep learning
algorithms in medical image analysis