37 research outputs found
An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks
Access to external knowledge is essential for
many natural language processing tasks, such
as question answering and dialogue. Existing methods often rely on a parametric model
that stores knowledge in its parameters, or use
a retrieval-augmented model that has access
to an external knowledge source. Parametric
and retrieval-augmented models have complementary strengths in terms of computational
efficiency and predictive accuracy. To combine the strength of both approaches, we propose the Efficient Memory-Augmented Transformer (EMAT) – it encodes external knowledge into a key-value memory and exploits the
fast maximum inner product search for memory querying. We also introduce pre-training
tasks that allow EMAT to encode informative key-value representations, and to learn an
implicit strategy to integrate multiple memory slots into the transformer. Experiments
on various knowledge-intensive tasks such as
question answering and dialogue datasets show
that, simply augmenting parametric models
(T5-base) using our method produces more
accurate results (e.g., 25.8 → 44.3 EM on
NQ) while retaining a high throughput (e.g.,
1000 queries/s on NQ). Compared to retrievalaugmented models, EMAT runs substantially
faster across the board and produces more accurate results on WoW and ELI5.
Entity centric neural models for natural language processing
This thesis explores how to enhance natural language understanding by incorporating entity information into neural network models. It tackles three key questions:1. Leveraging entities for understanding tasks: This work introduces Entity-GCN, a model that performs multi-step reasoning on a graph where nodes represent entity mentions and edges represent relationships. This method achieved state-of-the-art results on a multi-document question-answering dataset.2. Identifying and disambiguating entities using large language models: This research proposes a novel system that retrieves entities by generating their names token-by-token, overcoming limitations of traditional methods and significantly reducing memory footprint. This approach is also extended to a multilingual setting and further optimized for efficiency.3. Interpreting and controlling entity knowledge within models: This thesis presents a post-hoc interpretation technique to analyze how decisions are made across layers in neural models, allowing for visualization and analysis of knowledge representation. Additionally, a method for editing factual knowledge about entities is proposed, enabling correction of model predictions without costly retraining
Entity centric neural models for natural language processing
This thesis explores how to enhance natural language understanding by incorporating entity information into neural network models. It tackles three key questions:1. Leveraging entities for understanding tasks: This work introduces Entity-GCN, a model that performs multi-step reasoning on a graph where nodes represent entity mentions and edges represent relationships. This method achieved state-of-the-art results on a multi-document question-answering dataset.2. Identifying and disambiguating entities using large language models: This research proposes a novel system that retrieves entities by generating their names token-by-token, overcoming limitations of traditional methods and significantly reducing memory footprint. This approach is also extended to a multilingual setting and further optimized for efficiency.3. Interpreting and controlling entity knowledge within models: This thesis presents a post-hoc interpretation technique to analyze how decisions are made across layers in neural models, allowing for visualization and analysis of knowledge representation. Additionally, a method for editing factual knowledge about entities is proposed, enabling correction of model predictions without costly retraining
Recommended from our members
ANSWER SIMILARITY GROUPING AND DIVERSIFICATION IN QUESTION ANSWERING SYSTEMS
The rise in popularity of mobile and voice search has led to a shift in IR from document to passage retrieval for non-factoid questions. Various datasets such as MSMarco, as well as efficient retrieval models have been developed to identify single best answer passages for this task. However, such models do not specifically address questions which could have multiple or alternative answers. In this dissertation, we focus on this new research area that involves studying answer passage relationships and how this could be applied to passage retrieval tasks.
We first create a high quality dataset for the answer passage similarity task in the context of question answering. Manual annotation of passage pairs is performed to set the similarity labels, from which answer group information is automatically generated. We next investigate different types of representations, which could be used to create effective clusters. We experiment with various unsupervised representations and show that distributional representations outperform term based representations for this task. Next, weak supervision is leveraged to further improve the cluster modeling performance. We use BERT as the underlying model for training and show the relative performance of various weak signals such as GloVe and term-based Language Modeling for this task. In order to apply these clusters to the answer passage retrieval task for multi-answer questions, we use a modified version of the Maximal Marginal Relevance (MMR) diversification model. We demonstrate that answers retrieved using this model are more diverse i.e, cover more answer types with low redundancy as well as maximize relevance, with respect to the baselines. So far, we used passage clustering as a means to identify answer groups corresponding to a question and apply them in a question answering task. We extend this a step further by looking at related questions within a conversation. For this purpose, we expand the definition of Reciprocal Rank Fusion (RRF) and use this to identify pertinent history passages for such questions. Updated question rewrites generated using these passages are then used to improve the conversational search task. In addition to being the first work that looks at answer relationships, our specific contributions can be summarized as follows: (1) Creation of new datasets with passage similarity and answer type information; (2) Effective passage similarity clustering models using unsupervised representations and weak supervision methods; (3) Applying the passage similarity/clustering information to diversification framework; (4) Identifying good response history candidates using answer passage clustering for the conversational search task
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
On-device learning and efficient fine-tuning enable continuous and
privacy-preserving customization (e.g., locally fine-tuning large language
models on personalized data). However, existing training frameworks are
designed for cloud servers with powerful accelerators (e.g., GPUs, TPUs) and
lack the optimizations for learning on the edge, which faces challenges of
resource limitations and edge hardware diversity. We introduce PockEngine: a
tiny, sparse and efficient engine to enable fine-tuning on various edge
devices. PockEngine supports sparse backpropagation: it prunes the backward
graph and sparsely updates the model with measured memory saving and latency
reduction while maintaining the model quality. Secondly, PockEngine is
compilation first: the entire training graph (including forward, backward and
optimization steps) is derived at compile-time, which reduces the runtime
overhead and brings opportunities for graph transformations. PockEngine also
integrates a rich set of training graph optimizations, thus can further
accelerate the training cost, including operator reordering and backend
switching. PockEngine supports diverse applications, frontends and hardware
backends: it flexibly compiles and tunes models defined in
PyTorch/TensorFlow/Jax and deploys binaries to mobile CPU/GPU/DSPs. We
evaluated PockEngine on both vision models and large language models.
PockEngine achieves up to 15 speedup over off-the-shelf TensorFlow
(Raspberry Pi), 5.6 memory saving back-propagation (Jetson AGX Orin).
Remarkably, PockEngine enables fine-tuning LLaMav2-7B on NVIDIA Jetson AGX Orin
at 550 tokens/s, 7.9 faster than the PyTorch
S-KMN: Integrating Semantic Features Learning and Knowledge Mapping Network for Automatic Quiz Question Annotation
Quiz question annotation aims to assign the most relevant knowledge point to a question, which is a key technology to support intelligent education applications. However, the existing methods only extract the explicit semantic information that reveals the literal meaning of a question, and ignore the implicit knowledge information that highlights the knowledge intention. To this end, an innovative dual-channel model, the Semantic-Knowledge Mapping Network (S-KMN) is proposed to enrich the question representation from two perspectives, semantic and knowledge, simultaneously. It integrates semantic features learning and knowledge mapping network (KMN) to extract explicit semantic features and implicit knowledge features of questions,respectively. Designing KMN to extract implicit knowledge features is the focus of this study. First, the context-aware and sequence information of knowledge attribute words in the question text is integrated into the knowledge attribute graph to form the knowledge representation of each question. Second, learning a projection matrix, which maps the knowledge representation to the latent knowledge space based on the scene base vectors, and the weighted summations of these base vectors serve as knowledge features. To enrich the question representation, an attention mechanism is introduced to fuse explicit semantic features and implicit knowledge features, which realizes further cognitive processing on the basis of understanding semantics. The experimental results on 19,410 real-world physics quiz questions in 30 knowledge points demonstrate that the S-KMN outperforms the state-of-the-art text classification-based question annotation method. Comprehensive analysis and ablation studies validate the superiority of our model in selecting knowledge-specific features