28 research outputs found
3D Question Answering
Visual Question Answering (VQA) has witnessed tremendous progress in recent
years. However, most efforts only focus on the 2D image question answering
tasks. In this paper, we present the first attempt at extending VQA to the 3D
domain, which can facilitate artificial intelligence's perception of 3D
real-world scenarios. Different from image based VQA, 3D Question Answering
(3DQA) takes the color point cloud as input and requires both appearance and 3D
geometry comprehension ability to answer the 3D-related questions. To this end,
we propose a novel transformer-based 3DQA framework "3DQA-TR", which consists
of two encoders for exploiting the appearance and geometry information,
respectively. The multi-modal information of appearance, geometry, and the
linguistic question can finally attend to each other via a 3D-Linguistic Bert
to predict the target answers. To verify the effectiveness of our proposed 3DQA
framework, we further develop the first 3DQA dataset "ScanQA", which builds on
the ScanNet dataset and contains 6K questions, 30K answers for
scenes. Extensive experiments on this dataset demonstrate the obvious
superiority of our proposed 3DQA framework over existing VQA frameworks, and
the effectiveness of our major designs. Our code and dataset will be made
publicly available to facilitate the research in this direction.Comment: To Appear at IEEE Transactions on Visualization and Computer Graphics
(TVCG) 202
LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training
Natural language generation from structured data mainly focuses on
surface-level descriptions, suffering from uncontrollable content selection and
low fidelity. Previous works leverage logical forms to facilitate logical
knowledge-conditioned text generation. Though achieving remarkable progress,
they are data-hungry, which makes the adoption for real-world applications
challenging with limited data. To this end, this paper proposes a unified
framework for logical knowledge-conditioned text generation in the few-shot
setting. With only a few seeds logical forms (e.g., 20/100 shot), our approach
leverages self-training and samples pseudo logical forms based on content and
structure consistency. Experimental results demonstrate that our approach can
obtain better few-shot performance than baselines.Comment: Work in progres
Contrastive Demonstration Tuning for Pre-trained Language Models
Pretrained language models can be effectively stimulated by textual prompts
or demonstrations, especially in low-data scenarios. Recent works have focused
on automatically searching discrete or continuous prompts or optimized
verbalizers, yet studies for the demonstration are still limited. Concretely,
the demonstration examples are crucial for an excellent final performance of
prompt-tuning. In this paper, we propose a novel pluggable, extensible, and
efficient approach named contrastive demonstration tuning, which is free of
demonstration sampling. Furthermore, the proposed approach can be: (i) Plugged
to any previous prompt-tuning approaches; (ii) Extended to widespread
classification tasks with a large number of categories. Experimental results on
16 datasets illustrate that our method integrated with previous approaches
LM-BFF and P-tuning can yield better performance. Code is available in
https://github.com/zjunlp/PromptKG/tree/main/research/Demo-Tuning.Comment: Work in progres
Joint Inference for Knowledge Base Population
Populating Knowledge Base (KB) with new knowledge facts from reliable text resources usually consists of linking name mentions to KB entities and identifying relationship between entity pairs. However, the task often suffers from errors propagating from upstream entity linkers to downstream relation extractors. In this paper, we propose a novel joint inference framework to allow interactions between the two subtasks and find an optimal assignment by addressing the coherence among preliminary local predictions: whether the types of entities meet the expectations of relations explicitly or implicitly, and whether the local predictions are globally compatible. We further measure the confidence of the extracted triples by looking at the details of the complete extraction process. Experiments show that the proposed framework can significantly reduce the error propagations thus obtain more reliable facts, and outperforms competitive baselines with state-of-the-art relation extraction models. ? 2014 Association for Computational Linguistics.EI
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
In this paper, we introduce a novel dynamic expert selection framework for
Mixture of Experts (MoE) models, aiming to enhance computational efficiency and
model performance by adjusting the number of activated experts based on input
difficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing,
which activates a predetermined number of experts regardless of the input's
complexity, our method dynamically selects experts based on the confidence
level in expert selection for each input. This allows for a more efficient
utilization of computational resources, activating more experts for complex
tasks requiring advanced reasoning and fewer for simpler tasks. Through
extensive evaluations, our dynamic routing method demonstrates substantial
improvements over conventional Top-2 routing across various benchmarks,
achieving an average improvement of 0.7% with less than 90% activated
parameters. Further analysis shows our model dispatches more experts to tasks
requiring complex reasoning skills, like BBH, confirming its ability to
dynamically allocate computational resources in alignment with the input's
complexity. Our findings also highlight a variation in the number of experts
needed across different layers of the transformer model, offering insights into
the potential for designing heterogeneous MoE frameworks. The code and models
are available at https://github.com/ZhenweiAn/Dynamic_MoE