75 research outputs found
Entity Linking for Queries by Searching Wikipedia Sentences
We present a simple yet effective approach for linking entities in queries.
The key idea is to search sentences similar to a query from Wikipedia articles
and directly use the human-annotated entities in the similar sentences as
candidate entities for the query. Then, we employ a rich set of features, such
as link-probability, context-matching, word embeddings, and relatedness among
candidate entities as well as their related entities, to rank the candidates
under a regression based framework. The advantages of our approach lie in two
aspects, which contribute to the ranking process and final linking result.
First, it can greatly reduce the number of candidate entities by filtering out
irrelevant entities with the words in the query. Second, we can obtain the
query sensitive prior probability in addition to the static link-probability
derived from all Wikipedia articles. We conduct experiments on two benchmark
datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ
dataset. Experimental results show that our method outperforms state-of-the-art
systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ
dataset
How well do Large Language Models perform in Arithmetic tasks?
Large language models have emerged abilities including chain-of-thought to
answer math word problems step by step. Solving math word problems not only
requires abilities to disassemble problems via chain-of-thought but also needs
to calculate arithmetic expressions correctly for each step. To the best of our
knowledge, there is no work to focus on evaluating the arithmetic ability of
large language models. In this work, we propose an arithmetic dataset MATH 401
to test the latest large language models including GPT-4, ChatGPT, InstrctGPT,
Galactica, and LLaMA with various arithmetic expressions and provide a detailed
analysis of the ability of large language models. MATH 401 and evaluation codes
are released at \url{https://github.com/GanjinZero/math401-llm}
Boosting In-Context Learning with Factual Knowledge
In-Context Learning (ICL) over Large language models (LLMs) aims at solving
previously unseen tasks by conditioning on a few training examples, eliminating
the need for parameter updates and achieving competitive performance. In this
paper, we demonstrate that factual knowledge is imperative for the performance
of ICL in three core facets, i.e., the inherent knowledge learned in LLMs, the
factual knowledge derived from the selected in-context examples, and the
knowledge biases in LLMs for output generation. To unleash the power of LLMs in
few-shot learning scenarios, we introduce a novel Knowledgeable In-Context
Tuning (KICT) framework to further improve the performance of ICL: 1) injecting
factual knowledge to LLMs during continual self-supervised pre-training, 2)
judiciously selecting the examples with high knowledge relevance, and 3)
calibrating the prediction results based on prior knowledge. We evaluate the
proposed approaches on auto-regressive LLMs (e.g., GPT-style models) over
multiple text classification and question answering tasks. Experimental results
demonstrate that KICT substantially outperforms strong baselines, and improves
by more than 13% and 7% of accuracy on text classification and question
answering tasks, respectively
Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension
In cross-lingual language understanding, machine translation is often
utilized to enhance the transferability of models across languages, either by
translating the training data from the source language to the target, or from
the target to the source to aid inference. However, in cross-lingual machine
reading comprehension (MRC), it is difficult to perform a deep level of
assistance to enhance cross-lingual transfer because of the variation of answer
span positions in different languages. In this paper, we propose X-STA, a new
approach for cross-lingual MRC. Specifically, we leverage an attentive teacher
to subtly transfer the answer spans of the source language to the answer output
space of the target. A Gradient-Disentangled Knowledge Sharing technique is
proposed as an improved cross-attention block. In addition, we force the model
to learn semantic alignments from multiple granularities and calibrate the
model outputs with teacher guidance to enhance cross-lingual transferability.
Experiments on three multi-lingual MRC datasets show the effectiveness of our
method, outperforming state-of-the-art approaches.Comment: emnlp 202
- …