125 research outputs found
A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability
Graph Neural Networks (GNNs) have made rapid developments in the recent
years. Due to their great ability in modeling graph-structured data, GNNs are
vastly used in various applications, including high-stakes scenarios such as
financial analysis, traffic predictions, and drug discovery. Despite their
great potential in benefiting humans in the real world, recent study shows that
GNNs can leak private information, are vulnerable to adversarial attacks, can
inherit and magnify societal bias from training data and lack interpretability,
which have risk of causing unintentional harm to the users and society. For
example, existing works demonstrate that attackers can fool the GNNs to give
the outcome they desire with unnoticeable perturbation on training graph. GNNs
trained on social networks may embed the discrimination in their decision
process, strengthening the undesirable societal bias. Consequently, trustworthy
GNNs in various aspects are emerging to prevent the harm from GNN models and
increase the users' trust in GNNs. In this paper, we give a comprehensive
survey of GNNs in the computational aspects of privacy, robustness, fairness,
and explainability. For each aspect, we give the taxonomy of the related
methods and formulate the general frameworks for the multiple categories of
trustworthy GNNs. We also discuss the future research directions of each aspect
and connections between these aspects to help achieve trustworthiness
Data Augmentation for Deep Graph Learning: A Survey
Graph neural networks, a powerful deep learning tool to model
graph-structured data, have demonstrated remarkable performance on numerous
graph learning tasks. To address the data noise and data scarcity issues in
deep graph learning, the research on graph data augmentation has intensified
lately. However, conventional data augmentation methods can hardly handle
graph-structured data which is defined in non-Euclidean space with
multi-modality. In this survey, we formally formulate the problem of graph data
augmentation and further review the representative techniques and their
applications in different deep graph learning problems. Specifically, we first
propose a taxonomy for graph data augmentation techniques and then provide a
structured review by categorizing the related work based on the augmented
information modalities. Moreover, we summarize the applications of graph data
augmentation in two representative problems in data-centric deep graph
learning: (1) reliable graph learning which focuses on enhancing the utility of
input graph as well as the model capacity via graph data augmentation; and (2)
low-resource graph learning which targets on enlarging the labeled training
data scale through graph data augmentation. For each problem, we also provide a
hierarchical problem taxonomy and review the existing literature related to
graph data augmentation. Finally, we point out promising research directions
and the challenges in future research.Comment: Accepted by SIGKDD Explorations Paper list:
https://github.com/kaize0409/awesome-graph-data-augmentaio
Towards Data-centric Graph Machine Learning: Review and Outlook
Data-centric AI, with its primary focus on the collection, management, and
utilization of data to drive AI models and applications, has attracted
increasing attention in recent years. In this article, we conduct an in-depth
and comprehensive review, offering a forward-looking outlook on the current
efforts in data-centric AI pertaining to graph data-the fundamental data
structure for representing and capturing intricate dependencies among massive
and diverse real-life entities. We introduce a systematic framework,
Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of
the graph data lifecycle, including graph data collection, exploration,
improvement, exploitation, and maintenance. A thorough taxonomy of each stage
is presented to answer three critical graph-centric questions: (1) how to
enhance graph data availability and quality; (2) how to learn from graph data
with limited-availability and low-quality; (3) how to build graph MLOps systems
from the graph data-centric view. Lastly, we pinpoint the future prospects of
the DC-GML domain, providing insights to navigate its advancements and
applications.Comment: 42 pages, 9 figure
Online Social Networks: Measurements, Analysis and Solutions for Mining Challenges
In the last decade, online social networks showed enormous growth. With the rise
of these networks and the consequent availability of wealth social network data, Social
Network Analysis (SNA) led researchers to get the opportunity to access, analyse and
mine the social behaviour of millions of people, explore the way they communicate and
exchange information.
Despite the growing interest in analysing social networks, there are some challenges
and implications accompanying the analysis and mining of these networks. For example,
dealing with large-scale and evolving networks is not yet an easy task and still requires
a new mining solution. In addition, finding communities within these networks is a
challenging task and could open opportunities to see how people behave in groups on a
large scale. Also, the challenge of validating and optimizing communities without knowing
in advance the structure of the network due to the lack of ground truth is yet another
challenging barrier for validating the meaningfulness of the resulting communities.
In this thesis, we started by providing an overview of the necessary background and key
concepts required in the area of social networks analysis. Our main focus is to provide
solutions to tackle the key challenges in this area. For doing so, first, we introduce a predictive
technique to help in the prediction of the execution time of the analysis tasks for
evolving networks through employing predictive modeling techniques to the problem of
evolving and large-scale networks. Second, we study the performance of existing community
detection approaches to derive high quality community structure using a real email
network through analysing the exchange of emails and exploring community dynamics.
The aim is to study the community behavioral patterns and evaluate their quality within
an actual network. Finally, we propose an ensemble technique for deriving communities
using a rich internal enterprise real network in IBM that reflects real collaborations
and communications between employees. The technique aims to improve the community
detection process through the fusion of different algorithms
Built-In Return-Oriented Programs in Embedded Systems and Deep Learning for Hardware Trojan Detection
Microcontrollers and integrated circuits in general have become ubiquitous in the world today. All aspects of our lives depend on them from driving to work, to calling our friends, to checking our bank account balance. People who would do harm to individuals, corporations and nation states are aware of this and for that reason they seek to find or create and exploit vulnerabilities in integrated circuits. This dissertation contains three papers dealing with these types of vulnerabilities. The first paper talks about a vulnerability that was found on a microcontroller, which is a type of integrated circuit. The final two papers deal with hardware trojans. Hardware trojans are purposely added to the design of an integrated circuit in secret so that the manufacturer doesn’t know about it. They are used to damage the integrated circuit, leak confidential information, or in other ways alter the circuit. Hardware trojans are a major concern for anyone using integrated circuits because an attacker can alter a circuit in almost any way if they are successful in inserting one. A known method to prevent hardware trojan insertion is discussed and a type of circuit for which this method does not work is revealed. The discussion of hardware trojans is concluded with a new way to detect them before the integrated circuit is manufactured. Modern deep learning models are used to detect the portions of the hardware trojan called triggers that activate them
- …