83,051 research outputs found
Disentangled Graph Social Recommendation
Social recommender systems have drawn a lot of attention in many online web
services, because of the incorporation of social information between users in
improving recommendation results. Despite the significant progress made by
existing solutions, we argue that current methods fall short in two
limitations: (1) Existing social-aware recommendation models only consider
collaborative similarity between items, how to incorporate item-wise semantic
relatedness is less explored in current recommendation paradigms. (2) Current
social recommender systems neglect the entanglement of the latent factors over
heterogeneous relations (e.g., social connections, user-item interactions).
Learning the disentangled representations with relation heterogeneity poses
great challenge for social recommendation. In this work, we design a
Disentangled Graph Neural Network (DGNN) with the integration of latent memory
units, which empowers DGNN to maintain factorized representations for
heterogeneous types of user and item connections. Additionally, we devise new
memory-augmented message propagation and aggregation schemes under the graph
neural architecture, allowing us to recursively distill semantic relatedness
into the representations of users and items in a fully automatic manner.
Extensive experiments on three benchmark datasets verify the effectiveness of
our model by achieving great improvement over state-of-the-art recommendation
techniques. The source code is publicly available at:
https://github.com/HKUDS/DGNN.Comment: Accepted by IEEE ICDE 202
SgWalk: Location Recommendation by User Subgraph-Based Graph Embedding
Popularity of Location-based Social Networks (LBSNs) provides an opportunity to collect massive multi-modal datasets that contain geographical information, as well as time and social interactions. Such data is a useful resource for generating personalized location recommendations. Such heterogeneous data can be further extended with notions of trust between users, the popularity of locations, and the expertise of users. Recently the use of Heterogeneous Information Network (HIN) models and graph neural architectures have proven successful for recommendation problems. One limitation of such a solution is capturing the contextual relationships between the nodes in the heterogeneous network. In location recommendation, spatial context is a frequently used consideration such that users prefer to get recommendations within their spatial vicinity. To solve this challenging problem, we propose a novel Heterogeneous Information Network (HIN) embedding technique, SgWalk, which explores the proximity between users and locations and generates location recommendations via subgraph-based node embedding. SgWalk follows four steps: building users subgraphs according to location context, generating random walk sequences over user subgraphs, learning embeddings of nodes in LBSN graph, and generating location recommendations using vector representation of the nodes. SgWalk is differentiated from existing techniques relying on meta-path or bi-partite graphs by means of utilizing the contextual user subgraph. In this way, it is aimed to capture contextual relationships among heterogeneous nodes more effectively. The recommendation accuracy of SgWalk is analyzed through extensive experiments conducted on benchmark datasets in terms of top-n location recommendations. The accuracy evaluation results indicate minimum 23% (@5 recommendation) average improvement in accuracy compared to baseline techniques and the state-of-the-art heterogeneous graph embedding techniques in the literature
TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations
We present TwHIN-BERT, a multilingual language model trained on in-domain
data from the popular social network Twitter. TwHIN-BERT differs from prior
pre-trained language models as it is trained with not only text-based
self-supervision, but also with a social objective based on the rich social
engagements within a Twitter heterogeneous information network (TwHIN). Our
model is trained on 7 billion tweets covering over 100 distinct languages
providing a valuable representation to model short, noisy, user-generated text.
We evaluate our model on a variety of multilingual social recommendation and
semantic understanding tasks and demonstrate significant metric improvement
over established pre-trained language models. We will freely open-source
TwHIN-BERT and our curated hashtag prediction and social engagement benchmark
datasets to the research community
Source-Aware Embedding Training on Heterogeneous Information Networks
Heterogeneous information networks (HINs) have been extensively applied to
real-world tasks, such as recommendation systems, social networks, and citation
networks. While existing HIN representation learning methods can effectively
learn the semantic and structural features in the network, little awareness was
given to the distribution discrepancy of subgraphs within a single HIN.
However, we find that ignoring such distribution discrepancy among subgraphs
from multiple sources would hinder the effectiveness of graph embedding
learning algorithms. This motivates us to propose SUMSHINE (Scalable
Unsupervised Multi-Source Heterogeneous Information Network Embedding) -- a
scalable unsupervised framework to align the embedding distributions among
multiple sources of an HIN. Experimental results on real-world datasets in a
variety of downstream tasks validate the performance of our method over the
state-of-the-art heterogeneous information network embedding algorithms.Comment: Published in Data Intelligence 202
Privacy Risk in Anonymized Heterogeneous Information Networks
ABSTRACT Anonymized user datasets are often released for research or industry applications. As an example, t.qq.com released its anonymized users' profile, social interaction, and recommendation log data in KDD Cup 2012 to call for recommendation algorithms. Since the entities (users and so on) and edges (links among entities) are of multiple types, the released social network is a heterogeneous information network. Prior work has shown how privacy can be compromised in homogeneous information networks by the use of specific types of graph patterns. We show how the extra information derived from heterogeneity can be used to relax these assumptions. To characterize and demonstrate this added threat, we formally define privacy risk in an anonymized heterogeneous information network to identify the vulnerability in the possible way such data are released, and further present a new de-anonymization attack that exploits the vulnerability. Our attack successfully de-anonymized most individuals involved in the data-for an anonymized 1,000-user t.qq.com network of density 0.01, the attack precision is over 90% with a 2.3-million-user auxiliary network
Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs
Many problems in areas as diverse as recommendation systems, social network
analysis, semantic search, and distributed root cause analysis can be modeled
as pattern search on labeled graphs (also called "heterogeneous information
networks" or HINs). Given a large graph and a query pattern with node and edge
label constraints, a fundamental challenge is to nd the top-k matches ac-
cording to a ranking function over edge and node weights. For users, it is di
cult to select value k . We therefore propose the novel notion of an any-k
ranking algorithm: for a given time budget, re- turn as many of the top-ranked
results as possible. Then, given additional time, produce the next lower-ranked
results quickly as well. It can be stopped anytime, but may have to continues
until all results are returned. This paper focuses on acyclic patterns over
arbitrary labeled graphs. We are interested in practical algorithms that
effectively exploit (1) properties of heterogeneous networks, in particular
selective constraints on labels, and (2) that the users often explore only a
fraction of the top-ranked results. Our solution, KARPET, carefully integrates
aggressive pruning that leverages the acyclic nature of the query, and
incremental guided search. It enables us to prove strong non-trivial time and
space guarantees, which is generally considered very hard for this type of
graph search problem. Through experimental studies we show that KARPET achieves
running times in the order of milliseconds for tree patterns on large networks
with millions of nodes and edges.Comment: To appear in WWW 201
Mining and Analyzing the Academic Network
Social Network research has attracted the interests of many researchers, not only in analyzing the online social networking applications, such as Facebook and Twitter, but also in providing comprehensive services in scientific research domain. We define an Academic Network as a social network which integrates scientific factors, such as authors, papers, affiliations, publishing venues, and their relationships, such as co-authorship among authors and citations among papers. By mining and analyzing the academic network, we can provide users comprehensive services as searching for research experts, published papers, conferences, as well as detecting research communities or the evolutions hot research topics. We can also provide recommendations to users on with whom to collaborate, whom to cite and where to submit.In this dissertation, we investigate two main tasks that have fundamental applications in the academic network research. In the first, we address the problem of expertise retrieval, also known as expert finding or ranking, in which we identify and return a ranked list of researchers, based upon their estimated expertise or reputation, to user-specified queries. In the second, we address the problem of research action recommendation (prediction), specifically, the tasks of publishing venue recommendation, citation recommendation and coauthor recommendation. For both tasks, to effectively mine and integrate heterogeneous information and therefore develop well-functioning ranking or recommender systems is our principal goal. For the task of expertise retrieval, we first proposed or applied three modified versions of PageRank-like algorithms into citation network analysis; we then proposed an enhanced author-topic model by simultaneously modeling citation and publishing venue information; we finally incorporated the pair-wise learning-to-rank algorithm into traditional topic modeling process, and further improved the model by integrating groups of author-specific features. For the task of research action recommendation, we first proposed an improved neighborhood-based collaborative filtering approach for publishing venue recommendation; we then applied our proposed enhanced author-topic model and demonstrated its effectiveness in both cited author prediction and publishing venue prediction; finally we proposed an extended latent factor model that can jointly model several relations in an academic environment in a unified way and verified its performance in four recommendation tasks: the recommendation on author-co-authorship, author-paper citation, paper-paper citation and paper-venue submission. Extensive experiments conducted on large-scale real-world data sets demonstrated the superiority of our proposed models over other existing state-of-the-art methods
- …