25 research outputs found
Foundations and modelling of dynamic networks using Dynamic Graph Neural Networks: A survey
Dynamic networks are used in a wide range of fields, including social network
analysis, recommender systems, and epidemiology. Representing complex networks
as structures changing over time allow network models to leverage not only
structural but also temporal patterns. However, as dynamic network literature
stems from diverse fields and makes use of inconsistent terminology, it is
challenging to navigate. Meanwhile, graph neural networks (GNNs) have gained a
lot of attention in recent years for their ability to perform well on a range
of network science tasks, such as link prediction and node classification.
Despite the popularity of graph neural networks and the proven benefits of
dynamic network models, there has been little focus on graph neural networks
for dynamic networks. To address the challenges resulting from the fact that
this research crosses diverse fields as well as to survey dynamic graph neural
networks, this work is split into two main parts. First, to address the
ambiguity of the dynamic network terminology we establish a foundation of
dynamic networks with consistent, detailed terminology and notation. Second, we
present a comprehensive survey of dynamic graph neural network models using the
proposed terminologyComment: 28 pages, 9 figures, 8 table
Hybrid Link Prediction Model
In network science several topology--based link prediction methods have been developed so far. The classic social network link prediction approach takes as an input a snapshot of a whole network. However, with human activities behind it, this social network keeps changing. In this paper, we consider link prediction problem as a time--series problem and propose a hybrid link prediction model that combines eight structure-based prediction methods and self-adapts the weights assigned to each included method. To test the model, we perform experiments on two real world networks with both sliding and growing window scenarios. The results show that our model outperforms other structure--based methods when both precision and recall of the prediction results are considered
Heterogeneous Feature Representation for Digital Twin-Oriented Complex Networked Systems
Building models of Complex Networked Systems (CNS) that can accurately
represent reality forms an important research area. To be able to reflect real
world systems, the modelling needs to consider not only the intensity of
interactions between the entities but also features of all the elements of the
system. This study aims to improve the expressive power of node features in
Digital Twin-Oriented Complex Networked Systems (DT-CNSs) with heterogeneous
feature representation principles. This involves representing features with
crisp feature values and fuzzy sets, each describing the objective and the
subjective inductions of the nodes' features and feature differences. Our
empirical analysis builds DT-CNSs to recreate realistic physical contact
networks in different countries from real node feature distributions based on
various representation principles and an optimised feature preference. We also
investigate their respective disaster resilience to an epidemic outbreak
starting from the most popular node. The results suggest that the increasing
flexibility of feature representation with fuzzy sets improves the expressive
power and enables more accurate modelling. In addition, the heterogeneous
features influence the network structure and the speed of the epidemic
outbreak, requiring various mitigation policies targeted at different people
A Network Science perspective of Graph Convolutional Networks: A survey
The mining and exploitation of graph structural information have been the
focal points in the study of complex networks. Traditional structural measures
in Network Science focus on the analysis and modelling of complex networks from
the perspective of network structure, such as the centrality measures, the
clustering coefficient, and motifs and graphlets, and they have become basic
tools for studying and understanding graphs. In comparison, graph neural
networks, especially graph convolutional networks (GCNs), are particularly
effective at integrating node features into graph structures via neighbourhood
aggregation and message passing, and have been shown to significantly improve
the performances in a variety of learning tasks. These two classes of methods
are, however, typically treated separately with limited references to each
other. In this work, aiming to establish relationships between them, we provide
a network science perspective of GCNs. Our novel taxonomy classifies GCNs from
three structural information angles, i.e., the layer-wise message aggregation
scope, the message content, and the overall learning scope. Moreover, as a
prerequisite for reviewing GCNs via a network science perspective, we also
summarise traditional structural measures and propose a new taxonomy for them.
Finally and most importantly, we draw connections between traditional
structural approaches and graph convolutional networks, and discuss potential
directions for future research
Digital Twin-Oriented Complex Networked Systems based on Heterogeneous node features and interaction rules
This study proposes an extendable modelling framework for Digital
Twin-Oriented Complex Networked Systems (DT-CNSs) with a goal of generating
networks that faithfully represent real systems. Modelling process focuses on
(i) features of nodes and (ii) interaction rules for creating connections that
are built based on individual node's preferences. We conduct experiments on
simulation-based DT-CNSs that incorporate various features and rules about
network growth and different transmissibilities related to an epidemic spread
on these networks. We present a case study on disaster resilience of social
networks given an epidemic outbreak by investigating the infection occurrence
within specific time and social distance. The experimental results show how
different levels of the structural and dynamics complexities, concerned with
feature diversity and flexibility of interaction rules respectively, influence
network growth and epidemic spread. The analysis revealed that, to achieve
maximum disaster resilience, mitigation policies should be targeted at nodes
with preferred features as they have higher infection risks and should be the
focus of the epidemic control
Pricing Options with Portfolio-based Option Trading Agents in Direct Double Auction
Options constitute integral part of modern financial trades, and are priced according to the risk associated with buying or selling certain asset in future. Financial literature mostly concentrates on risk-neutral methods of pricing options such as Black- Scholes model. However, using trading agents with utility function to determine the optionās potential payoff is an emerging field in option pricing theory. In this paper, we use one of such methodologies developed by Othman and Sandholm to design portfolioholding agents that are endowed with popular option portfolios such as bullish spread, bearish spread, butterfly spread, straddle, etc to price options. Agents use their portfolios to evaluate how buying or selling certain option would change their current payoff structure. We also develop a multi-unit direct double auction which preserves the atomicity of orders at the expense of budget balance. Agents are simulated in this mechanism and the emerging prices are compared to risk-neutral prices under different market conditions. Through an appropriate allocation of option portfolios to trading agents, we can simulate market conditions where the population of agents are bearish, bullish, neutral or non-neutral in their beliefs
AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline Composition and Optimisation
Automated machine learning pipeline (ML) composition and optimisation aim at
automating the process of finding the most promising ML pipelines within
allocated resources (i.e., time, CPU and memory). Existing methods, such as
Bayesian-based and genetic-based optimisation, which are implemented in
Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them.
Therefore, the pipeline composition and optimisation of these methods
frequently require a tremendous amount of time that prevents them from
exploring complex pipelines to find better predictive models. To further
explore this research challenge, we have conducted experiments showing that
many of the generated pipelines are invalid in the first place, and attempting
to execute them is a waste of time and resources. To address this issue, we
propose a novel method to evaluate the validity of ML pipelines, without their
execution, using a surrogate model (AVATAR). The AVATAR generates a knowledge
base by automatically learning the capabilities and effects of ML algorithms on
datasets' characteristics. This knowledge base is used for a simplified mapping
from an original ML pipeline to a surrogate model which is a Petri net based
pipeline. Instead of executing the original ML pipeline to evaluate its
validity, the AVATAR evaluates its surrogate model constructed by capabilities
and effects of the ML pipeline components and input/output simplified mappings.
Evaluating this surrogate model is less resource-intensive than the execution
of the original pipeline. As a result, the AVATAR enables the pipeline
composition and optimisation methods to evaluate more pipelines by quickly
rejecting invalid pipelines. We integrate the AVATAR into the sequential
model-based algorithm configuration (SMAC). Our experiments show that when SMAC
employs AVATAR, it finds better solutions than on its own.Comment: arXiv admin note: substantial text overlap with arXiv:2001.1115
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size
Neural architecture search (NAS) has attracted a lot of attention and has
been illustrated to bring tangible benefits in a large number of applications
in the past few years. Architecture topology and architecture size have been
regarded as two of the most important aspects for the performance of deep
learning models and the community has spawned lots of searching algorithms for
both aspects of the neural architectures. However, the performance gain from
these searching algorithms is achieved under different search spaces and
training setups. This makes the overall performance of the algorithms to some
extent incomparable and the improvement from a sub-module of the searching
model unclear. In this paper, we propose NATS-Bench, a unified benchmark on
searching for both topology and size, for (almost) any up-to-date NAS
algorithm. NATS-Bench includes the search space of 15,625 neural cell
candidates for architecture topology and 32,768 for architecture size on three
datasets. We analyze the validity of our benchmark in terms of various criteria
and performance comparison of all candidates in the search space. We also show
the versatility of NATS-Bench by benchmarking 13 recent state-of-the-art NAS
algorithms on it. All logs and diagnostic information trained using the same
setup for each candidate are provided. This facilitates a much larger community
of researchers to focus on developing better NAS algorithms in a more
comparable and computationally cost friendly environment. All codes are
publicly available at: https://xuanyidong.com/assets/projects/NATS-Bench.Comment: Accepted to IEEE TPAMI 2021, an extended version of NAS-Bench-201
(ICLR 2020) [arXiv:2001.00326
The Technological Emergence of AutoML: A Survey of Performant Software and Applications in the Context of Industry
With most technical fields, there exists a delay between fundamental academic
research and practical industrial uptake. Whilst some sciences have robust and
well-established processes for commercialisation, such as the pharmaceutical
practice of regimented drug trials, other fields face transitory periods in
which fundamental academic advancements diffuse gradually into the space of
commerce and industry. For the still relatively young field of
Automated/Autonomous Machine Learning (AutoML/AutonoML), that transitory period
is under way, spurred on by a burgeoning interest from broader society. Yet, to
date, little research has been undertaken to assess the current state of this
dissemination and its uptake. Thus, this review makes two primary contributions
to knowledge around this topic. Firstly, it provides the most up-to-date and
comprehensive survey of existing AutoML tools, both open-source and commercial.
Secondly, it motivates and outlines a framework for assessing whether an AutoML
solution designed for real-world application is 'performant'; this framework
extends beyond the limitations of typical academic criteria, considering a
variety of stakeholder needs and the human-computer interactions required to
service them. Thus, additionally supported by an extensive assessment and
comparison of academic and commercial case-studies, this review evaluates
mainstream engagement with AutoML in the early 2020s, identifying obstacles and
opportunities for accelerating future uptake
On accuracy of PDF divergence estimators and their applicability to representative data sampling
Generalisation error estimation is an important issue in machine learning. Cross-validation traditionally used for this purpose requires building multiple models and repeating the whole procedure many times in order to produce reliable error estimates. It is however possible to accurately estimate the error using only a single model, if the training and test data are chosen appropriately. This paper investigates the possibility of using various probability density function divergence measures for the purpose of representative data sampling. As it turned out, the first difficulty one needs to deal with is estimation of the divergence itself. In contrast to other publications on this subject, the experimental results provided in this study show that in many cases it is not possible unless samples consisting of thousands of instances are used. Exhaustive experiments on the divergence guided representative data sampling have been performed using 26 publicly available benchmark datasets and 70 PDF divergence estimators, and their results have been analysed and discussed