4 research outputs found
Towards Improved Illicit Node Detection with Positive-Unlabelled Learning
Detecting illicit nodes on blockchain networks is a valuable task for
strengthening future regulation. Recent machine learning-based methods proposed
to tackle the tasks are using some blockchain transaction datasets with a small
portion of samples labeled positive and the rest unlabelled (PU). Albeit the
assumption that a random sample of unlabeled nodes are normal nodes is used in
some works, we discuss that the label mechanism assumption for the hidden
positive labels and its effect on the evaluation metrics is worth considering.
We further explore that PU classifiers dealing with potential hidden positive
labels can have improved performance compared to regular machine learning
models. We test the PU classifiers with a list of graph representation learning
methods for obtaining different feature distributions for the same data to have
more reliable results.Comment: accepted at the 5th edition of the IEEE International Conference on
Blockchain and Cryptocurrency (ICBC 2023
Active Keyword Selection to Track Evolving Topics on Twitter
How can we study social interactions on evolving topics at a mass scale? Over
the past decade, researchers from diverse fields such as economics, political
science, and public health have often done this by querying Twitter's public
API endpoints with hand-picked topical keywords to search or stream
discussions. However, despite the API's accessibility, it remains difficult to
select and update keywords to collect high-quality data relevant to topics of
interest. In this paper, we propose an active learning method for rapidly
refining query keywords to increase both the yielded topic relevance and
dataset size. We leverage a large open-source COVID-19 Twitter dataset to
illustrate the applicability of our method in tracking Tweets around the key
sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method
achieves an average topic-related keyword recall 2x higher than baselines. We
open-source our code along with a web interface for keyword selection to make
data collection from Twitter more systematic for researchers.Comment: 10 pages, 3 figure
Towards Temporal Edge Regression: A Case Study on Agriculture Trade Between Nations
Recently, Graph Neural Networks (GNNs) have shown promising performance in
tasks on dynamic graphs such as node classification, link prediction and graph
regression. However, few work has studied the temporal edge regression task
which has important real-world applications. In this paper, we explore the
application of GNNs to edge regression tasks in both static and dynamic
settings, focusing on predicting food and agriculture trade values between
nations. We introduce three simple yet strong baselines and comprehensively
evaluate one static and three dynamic GNN models using the UN Trade dataset.
Our experimental results reveal that the baselines exhibit remarkably strong
performance across various settings, highlighting the inadequacy of existing
GNNs. We also find that TGN outperforms other GNN models, suggesting TGN is a
more appropriate choice for edge regression tasks. Moreover, we note that the
proportion of negative edges in the training samples significantly affects the
test performance. The companion source code can be found at:
https://github.com/scylj1/GNN_Edge_Regression.Comment: 12 pages, 4 figures, 4 table
Temporal Graph Benchmark for Machine Learning on Temporal Graphs
We present the Temporal Graph Benchmark (TGB), a collection of challenging
and diverse benchmark datasets for realistic, reproducible, and robust
evaluation of machine learning models on temporal graphs. TGB datasets are of
large scale, spanning years in duration, incorporate both node and edge-level
prediction tasks and cover a diverse set of domains including social, trade,
transaction, and transportation networks. For both tasks, we design evaluation
protocols based on realistic use-cases. We extensively benchmark each dataset
and find that the performance of common models can vary drastically across
datasets. In addition, on dynamic node property prediction tasks, we show that
simple methods often achieve superior performance compared to existing temporal
graph models. We believe that these findings open up opportunities for future
research on temporal graphs. Finally, TGB provides an automated machine
learning pipeline for reproducible and accessible temporal graph research,
including data loading, experiment setup and performance evaluation. TGB will
be maintained and updated on a regular basis and welcomes community feedback.
TGB datasets, data loaders, example codes, evaluation setup, and leaderboards
are publicly available at https://tgb.complexdatalab.com/.Comment: 20 pages, 7 figures, 7 tables, accepted at NeurIPS 2023 Datasets and
Benchmarks Trac