69 research outputs found
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
Zero-shot Visual Question Answering (VQA) is a prominent vision-language task
that examines both the visual and textual understanding capability of systems
in the absence of training data. Recently, by converting the images into
captions, information across multi-modalities is bridged and Large Language
Models (LLMs) can apply their strong zero-shot generalization capability to
unseen questions. To design ideal prompts for solving VQA via LLMs, several
studies have explored different strategies to select or generate
question-answer pairs as the exemplar prompts, which guide LLMs to answer the
current questions effectively. However, they totally ignore the role of
question prompts. The original questions in VQA tasks usually encounter
ellipses and ambiguity which require intermediate reasoning. To this end, we
present Reasoning Question Prompts for VQA tasks, which can further activate
the potential of LLMs in zero-shot scenarios. Specifically, for each question,
we first generate self-contained questions as reasoning question prompts via an
unsupervised question edition module considering sentence fluency, semantic
integrity and syntactic invariance. Each reasoning question prompt clearly
indicates the intent of the original question. This results in a set of
candidate answers. Then, the candidate answers associated with their confidence
scores acting as answer heuristics are fed into LLMs and produce the final
answer. We evaluate reasoning question prompts on three VQA challenges,
experimental results demonstrate that they can significantly improve the
results of LLMs on zero-shot setting and outperform existing state-of-the-art
zero-shot methods on three out of four data sets. Our source code is publicly
released at \url{https://github.com/ECNU-DASE-NLP/RQP}
Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation
The task of Question Generation over Knowledge Bases (KBQG) aims to convert a
logical form into a natural language question. For the sake of expensive cost
of large-scale question annotation, the methods of KBQG under low-resource
scenarios urgently need to be developed. However, current methods heavily rely
on annotated data for fine-tuning, which is not well-suited for few-shot
question generation. The emergence of Large Language Models (LLMs) has shown
their impressive generalization ability in few-shot tasks. Inspired by
Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for
reasoning, we formulate KBQG task as a reasoning problem, where the generation
of a complete question is splitted into a series of sub-question generation.
Our proposed prompting method KQG-CoT first retrieves supportive logical forms
from the unlabeled data pool taking account of the characteristics of the
logical form. Then, we write a prompt to explicit the reasoning chain of
generating complicated questions based on the selected demonstrations. To
further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the
logical forms by their complexity. We conduct extensive experiments over three
public KBQG datasets. The results demonstrate that our prompting method
consistently outperforms other prompting baselines on the evaluated datasets.
Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of
the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4,
METEOR, and ROUGE-L, respectively.Comment: Accepted by EMNLP 2023 main conferenc
Aligning Large Language Models to a Domain-specific Graph Database
Graph Databases (Graph DB) are widely applied in various fields, including
finance, social networks, and medicine. However, translating Natural Language
(NL) into the Graph Query Language (GQL), commonly known as NL2GQL, proves to
be challenging due to its inherent complexity and specialized nature. Some
approaches have sought to utilize Large Language Models (LLMs) to address
analogous tasks like text2SQL. Nevertheless, when it comes to NL2GQL taskson a
particular domain, the absence of domain-specific NL-GQL data pairs makes it
difficult to establish alignment between LLMs and the graph DB. To address this
challenge, we propose a well-defined pipeline. Specifically, we utilize ChatGPT
to create NL-GQL data pairs based on the given graph DB with self-instruct.
Then, we use the created data to fine-tune LLMs, thereby achieving alignment
between LLMs and the graph DB. Additionally, during inference, we propose a
method that extracts relevant schema to the queried NL as the input context to
guide LLMs for generating accurate GQLs.We evaluate our method on two
constructed datasets deriving from graph DBs in finance domain and medicine
domain, namely FinGQL and MediGQL. Experimental results demonstrate that our
method significantly outperforms a set of baseline methods, with improvements
of 5.90 and 6.36 absolute points on EM, and 6.00 and 7.09 absolute points on
EX, respectively.Comment: 13 pages,2 figure
DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion
Self-attention-based vision transformers (ViTs) have emerged as a highly
competitive architecture in computer vision. Unlike convolutional neural
networks (CNNs), ViTs are capable of global information sharing. With the
development of various structures of ViTs, ViTs are increasingly advantageous
for many vision tasks. However, the quadratic complexity of self-attention
renders ViTs computationally intensive, and their lack of inductive biases of
locality and translation equivariance demands larger model sizes compared to
CNNs to effectively learn visual features. In this paper, we propose a
light-weight and efficient vision transformer model called DualToken-ViT that
leverages the advantages of CNNs and ViTs. DualToken-ViT effectively fuses the
token with local information obtained by convolution-based structure and the
token with global information obtained by self-attention-based structure to
achieve an efficient attention structure. In addition, we use position-aware
global tokens throughout all stages to enrich the global information, which
further strengthening the effect of DualToken-ViT. Position-aware global tokens
also contain the position information of the image, which makes our model
better for vision tasks. We conducted extensive experiments on image
classification, object detection and semantic segmentation tasks to demonstrate
the effectiveness of DualToken-ViT. On the ImageNet-1K dataset, our models of
different scales achieve accuracies of 75.4% and 79.4% with only 0.5G and 1.0G
FLOPs, respectively, and our model with 1.0G FLOPs outperforms LightViT-T using
global tokens by 0.7%
Impact of Multimedia in Sina Weibo: Popularity and Life Span
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ
Impact of Multimedia in Sina Weibo: Popularity and Life Span
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ
Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the
performance of various downstream NLP tasks by injecting knowledge facts from
large-scale Knowledge Graphs (KGs). However, existing methods for pre-training
KEPLMs with relational triples are difficult to be adapted to close domains due
to the lack of sufficient domain graph semantics. In this paper, we propose a
Knowledge-enhanced lANGuAge Representation learning framework for various
clOsed dOmains (KANGAROO) via capturing the implicit graph structure among the
entities. Specifically, since the entity coverage rates of closed-domain KGs
can be relatively low and may exhibit the global sparsity phenomenon for
knowledge injection, we consider not only the shallow relational
representations of triples but also the hyperbolic embeddings of deep
hierarchical entity-class structures for effective knowledge fusion.Moreover,
as two closed-domain entities under the same entity-class often have locally
dense neighbor subgraphs counted by max point biconnected component, we further
propose a data augmentation strategy based on contrastive learning over
subgraphs to construct hard negative samples of higher quality. It makes the
underlying KELPMs better distinguish the semantics of these neighboring
entities to further complement the global semantic sparsity. In the
experiments, we evaluate KANGAROO over various knowledge-aware and general NLP
tasks in both full and few-shot learning settings, outperforming various KEPLM
training paradigms performance in closed-domains significantly.Comment: emnlp 202
Adaptively detecting aggregation bursts in data streams
Abstract. Finding bursts in data streams is attracting much attention in research community due to its broad applications. Existing burst detection methods suffer the problems that 1) the parameters of window size and absolute burst threshold, which are hard to be determined a priori, should be given in advance. 2) Only one side bursts, i.e. either increasing or decreasing bursts, can be detected. 3) Bumps, which are changes of aggregation data caused by noises, are often reported as bursts. The disturbance of bumps causes much effort in subsequent exploration of mining results. In this paper, a general burst model is introduced for overcoming above three problems. We develop an efficient algorithm for detecting adaptive aggregation bursts in a data stream given a burst ratio. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that bursts on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method is relatively good, while experimental results depict the applicability and efficiency of our algorithm in different application settings
- …