41,784 research outputs found
S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification
This paper investigates the problem of active learning for binary label
prediction on a graph. We introduce a simple and label-efficient algorithm
called S2 for this task. At each step, S2 selects the vertex to be labeled
based on the structure of the graph and all previously gathered labels.
Specifically, S2 queries for the label of the vertex that bisects the *shortest
shortest* path between any pair of oppositely labeled vertices. We present a
theoretical estimate of the number of queries S2 needs in terms of a novel
parametrization of the complexity of binary functions on graphs. We also
present experimental results demonstrating the performance of S2 on both real
and synthetic data. While other graph-based active learning algorithms have
shown promise in practice, our algorithm is the first with both good
performance and theoretical guarantees. Finally, we demonstrate the
implications of the S2 algorithm to the theory of nonparametric active
learning. In particular, we show that S2 achieves near minimax optimal excess
risk for an important class of nonparametric classification problems.Comment: A version of this paper appears in the Conference on Learning Theory
(COLT) 201
Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding
Conversational AI systems such as Alexa need to understand defective queries
to ensure robust conversational understanding and reduce user friction. These
defective queries often arise from user ambiguities, mistakes, or errors in
automatic speech recognition (ASR) and natural language understanding (NLU).
Personalized query rewriting is an approach that focuses on reducing defects
in queries by taking into account the user's individual behavior and
preferences. It typically relies on an index of past successful user
interactions with the conversational AI. However, unseen interactions within
the user's history present additional challenges for personalized query
rewriting. This paper presents our "Collaborative Query Rewriting" approach,
which specifically addresses the task of rewriting new user interactions that
have not been previously observed in the user's history. This approach builds a
"User Feedback Interaction Graph" (FIG) of historical user-entity interactions
and leverages multi-hop graph traversal to enrich each user's index to cover
future unseen defective queries. The enriched user index is called a
Collaborative User Index and contains hundreds of additional entries. To
counteract precision degradation from the enlarged index, we add additional
transformer layers to the L1 retrieval model and incorporate graph-based and
guardrail features into the L2 ranking model.
Since the user index can be pre-computed, we further investigate the
utilization of a Large Language Model (LLM) to enhance the FIG for user-entity
link prediction in the Video/Music domains. Specifically, this paper
investigates the Dolly-V2 7B model. We found that the user index augmented by
the fine-tuned Dolly-V2 generation significantly enhanced the coverage of
future unseen user interactions, thereby boosting QR performance on unseen
queries compared with the graph traversal only approach
Inductive Logical Query Answering in Knowledge Graphs
Formulating and answering logical queries is a standard communication
interface for knowledge graphs (KGs). Alleviating the notorious incompleteness
of real-world KGs, neural methods achieved impressive results in link
prediction and complex query answering tasks by learning representations of
entities, relations, and queries. Still, most existing query answering methods
rely on transductive entity embeddings and cannot generalize to KGs containing
new entities without retraining the entity embeddings. In this work, we study
the inductive query answering task where inference is performed on a graph
containing new entities with queries over both seen and unseen entities. To
this end, we devise two mechanisms leveraging inductive node and relational
structure representations powered by graph neural networks (GNNs).
Experimentally, we show that inductive models are able to perform logical
reasoning at inference time over unseen nodes generalizing to graphs up to 500%
larger than training ones. Exploring the efficiency--effectiveness trade-off,
we find the inductive relational structure representation method generally
achieves higher performance, while the inductive node representation method is
able to answer complex queries in the inference-only regime without any
training on queries and scales to graphs of millions of nodes. Code is
available at https://github.com/DeepGraphLearning/InductiveQE.Comment: Accepted at NeurIPS 202
Integrating Graphs with Large Language Models: Methods and Prospects
Large language models (LLMs) such as GPT-4 have emerged as frontrunners,
showcasing unparalleled prowess in diverse applications, including answering
queries, code generation, and more. Parallelly, graph-structured data, an
intrinsic data type, is pervasive in real-world scenarios. Merging the
capabilities of LLMs with graph-structured data has been a topic of keen
interest. This paper bifurcates such integrations into two predominant
categories. The first leverages LLMs for graph learning, where LLMs can not
only augment existing graph algorithms but also stand as prediction models for
various graph tasks. Conversely, the second category underscores the pivotal
role of graphs in advancing LLMs. Mirroring human cognition, we solve complex
tasks by adopting graphs in either reasoning or collaboration. Integrating with
such structures can significantly boost the performance of LLMs in various
complicated tasks. We also discuss and propose open questions for integrating
LLMs with graph-structured data for the future direction of the field
The Secure Link Prediction Problem
Link Prediction is an important and well-studied problem for social networks.
Given a snapshot of a graph, the link prediction problem predicts which new
interactions between members are most likely to occur in the near future. As
networks grow in size, data owners are forced to store the data in remote cloud
servers which reveals sensitive information about the network. The graphs are
therefore stored in encrypted form.
We study the link prediction problem on encrypted graphs. To the best of our
knowledge, this secure link prediction problem has not been studied before. We
use the number of common neighbors for prediction. We present three algorithms
for the secure link prediction problem. We design prototypes of the schemes and
formally prove their security. We execute our algorithms in real-life datasets.Comment: This has been accepted for publication in Advances in Mathematics of
Communications (AMC) journa
- …