8,646 research outputs found
Active Discovery of Network Roles for Predicting the Classes of Network Nodes
Nodes in real world networks often have class labels, or underlying
attributes, that are related to the way in which they connect to other nodes.
Sometimes this relationship is simple, for instance nodes of the same class are
may be more likely to be connected. In other cases, however, this is not true,
and the way that nodes link in a network exhibits a different, more complex
relationship to their attributes. Here, we consider networks in which we know
how the nodes are connected, but we do not know the class labels of the nodes
or how class labels relate to the network links. We wish to identify the best
subset of nodes to label in order to learn this relationship between node
attributes and network links. We can then use this discovered relationship to
accurately predict the class labels of the rest of the network nodes.
We present a model that identifies groups of nodes with similar link
patterns, which we call network roles, using a generative blockmodel. The model
then predicts labels by learning the mapping from network roles to class labels
using a maximum margin classifier. We choose a subset of nodes to label
according to an iterative margin-based active learning strategy. By integrating
the discovery of network roles with the classifier optimisation, the active
learning process can adapt the network roles to better represent the network
for node classification. We demonstrate the model by exploring a selection of
real world networks, including a marine food web and a network of English
words. We show that, in contrast to other network classifiers, this model
achieves good classification accuracy for a range of networks with different
relationships between class labels and network links
Transfer Meets Hybrid: A Synthetic Approach for Cross-Domain Collaborative Filtering with Text
Collaborative filtering (CF) is the key technique for recommender systems
(RSs). CF exploits user-item behavior interactions (e.g., clicks) only and
hence suffers from the data sparsity issue. One research thread is to integrate
auxiliary information such as product reviews and news titles, leading to
hybrid filtering methods. Another thread is to transfer knowledge from other
source domains such as improving the movie recommendation with the knowledge
from the book domain, leading to transfer learning methods. In real-world life,
no single service can satisfy a user's all information needs. Thus it motivates
us to exploit both auxiliary and source information for RSs in this paper. We
propose a novel neural model to smoothly enable Transfer Meeting Hybrid (TMH)
methods for cross-domain recommendation with unstructured text in an end-to-end
manner. TMH attentively extracts useful content from unstructured text via a
memory module and selectively transfers knowledge from a source domain via a
transfer network. On two real-world datasets, TMH shows better performance in
terms of three ranking metrics by comparing with various baselines. We conduct
thorough analyses to understand how the text content and transferred knowledge
help the proposed model.Comment: 11 pages, 7 figures, a full version for the WWW 2019 short pape
Distributed Online Big Data Classification Using Context Information
Distributed, online data mining systems have emerged as a result of
applications requiring analysis of large amounts of correlated and
high-dimensional data produced by multiple distributed data sources. We propose
a distributed online data classification framework where data is gathered by
distributed data sources and processed by a heterogeneous set of distributed
learners which learn online, at run-time, how to classify the different data
streams either by using their locally available classification functions or by
helping each other by classifying each other's data. Importantly, since the
data is gathered at different locations, sending the data to another learner to
process incurs additional costs such as delays, and hence this will be only
beneficial if the benefits obtained from a better classification will exceed
the costs. We model the problem of joint classification by the distributed and
heterogeneous learners from multiple data sources as a distributed contextual
bandit problem where each data is characterized by a specific context. We
develop a distributed online learning algorithm for which we can prove
sublinear regret. Compared to prior work in distributed online data mining, our
work is the first to provide analytic regret results characterizing the
performance of the proposed algorithm
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
The unsupervised detection of anomalies in time series data has important
applications in user behavioral modeling, fraud detection, and cybersecurity.
Anomaly detection has, in fact, been extensively studied in categorical
sequences. However, we often have access to time series data that represent
paths through networks. Examples include transaction sequences in financial
networks, click streams of users in networks of cross-referenced documents, or
travel itineraries in transportation networks. To reliably detect anomalies, we
must account for the fact that such data contain a large number of independent
observations of paths constrained by a graph topology. Moreover, the
heterogeneity of real systems rules out frequency-based anomaly detection
techniques, which do not account for highly skewed edge and degree statistics.
To address this problem, we introduce HYPA, a novel framework for the
unsupervised detection of anomalies in large corpora of variable-length
temporal paths in a graph. HYPA provides an efficient analytical method to
detect paths with anomalous frequencies that result from nodes being traversed
in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM
Data Mining (SDM 2020
- …