Search CORE

37 research outputs found

Combining contextualized and non-contextualized embeddings for domain adaptation and beyond

Author: Pörner Nina Mareike
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 04/03/2021
Field of study

Digitale Hochschulschriften der LMU

Processing questions with multi-task sentence embedding

Author: Xu Zhaozhen
Publication venue
Publication date: 09/05/2023
Field of study

Explore Bristol Research

An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

Author: Hu B
Minervini P
Riedel S
Stenetorp P
Wu Y
Zhao Y
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 11/12/2022
Field of study

Access to external knowledge is essential for many natural language processing tasks, such as question answering and dialogue. Existing methods often rely on a parametric model that stores knowledge in its parameters, or use a retrieval-augmented model that has access to an external knowledge source. Parametric and retrieval-augmented models have complementary strengths in terms of computational efficiency and predictive accuracy. To combine the strength of both approaches, we propose the Efficient Memory-Augmented Transformer (EMAT) – it encodes external knowledge into a key-value memory and exploits the fast maximum inner product search for memory querying. We also introduce pre-training tasks that allow EMAT to encode informative key-value representations, and to learn an implicit strategy to integrate multiple memory slots into the transformer. Experiments on various knowledge-intensive tasks such as question answering and dialogue datasets show that, simply augmenting parametric models (T5-base) using our method produces more accurate results (e.g., 25.8 → 44.3 EM on NQ) while retaining a high throughput (e.g., 1000 queries/s on NQ). Compared to retrievalaugmented models, EMAT runs substantially faster across the board and produces more accurate results on WoW and ELI5.

UCL Discovery

Entity centric neural models for natural language processing

Author: De Cao N.
Publication venue
Publication date: 01/01/2024
Field of study

This thesis explores how to enhance natural language understanding by incorporating entity information into neural network models. It tackles three key questions:1. Leveraging entities for understanding tasks: This work introduces Entity-GCN, a model that performs multi-step reasoning on a graph where nodes represent entity mentions and edges represent relationships. This method achieved state-of-the-art results on a multi-document question-answering dataset.2. Identifying and disambiguating entities using large language models: This research proposes a novel system that retrieves entities by generating their names token-by-token, overcoming limitations of traditional methods and significantly reducing memory footprint. This approach is also extended to a multilingual setting and further optimized for efficiency.3. Interpreting and controlling entity knowledge within models: This thesis presents a post-hoc interpretation technique to analyze how decisions are made across layers in neural models, allowing for visualization and analysis of knowledge representation. Additionally, a method for editing factual knowledge about entities is proposed, enabling correction of model predictions without costly retraining

International Migration, Integration and Social Cohesion online publications

Entity centric neural models for natural language processing

Author: De Cao N.
Publication venue
Publication date: 01/01/2024
Field of study

International Migration, Integration and Social Cohesion online publications

Recommended from our members

ANSWER SIMILARITY GROUPING AND DIVERSIFICATION IN QUESTION ANSWERING SYSTEMS

Author: Vikraman Lakshmi Nair
Publication venue: ScholarWorks@UMass Amherst
Publication date: 26/10/2022
Field of study

The rise in popularity of mobile and voice search has led to a shift in IR from document to passage retrieval for non-factoid questions. Various datasets such as MSMarco, as well as efficient retrieval models have been developed to identify single best answer passages for this task. However, such models do not specifically address questions which could have multiple or alternative answers. In this dissertation, we focus on this new research area that involves studying answer passage relationships and how this could be applied to passage retrieval tasks. We first create a high quality dataset for the answer passage similarity task in the context of question answering. Manual annotation of passage pairs is performed to set the similarity labels, from which answer group information is automatically generated. We next investigate different types of representations, which could be used to create effective clusters. We experiment with various unsupervised representations and show that distributional representations outperform term based representations for this task. Next, weak supervision is leveraged to further improve the cluster modeling performance. We use BERT as the underlying model for training and show the relative performance of various weak signals such as GloVe and term-based Language Modeling for this task. In order to apply these clusters to the answer passage retrieval task for multi-answer questions, we use a modified version of the Maximal Marginal Relevance (MMR) diversification model. We demonstrate that answers retrieved using this model are more diverse i.e, cover more answer types with low redundancy as well as maximize relevance, with respect to the baselines. So far, we used passage clustering as a means to identify answer groups corresponding to a question and apply them in a question answering task. We extend this a step further by looking at related questions within a conversation. For this purpose, we expand the definition of Reciprocal Rank Fusion (RRF) and use this to identify pertinent history passages for such questions. Updated question rewrites generated using these passages are then used to improve the conversational search task. In addition to being the first work that looks at answer relationships, our specific contributions can be summarized as follows: (1) Creation of new datasets with passage similarity and answer type information; (2) Effective passage similarity clustering models using unsupervised representations and weak supervision methods; (3) Applying the passage similarity/clustering information to diversification framework; (4) Identifying good response history candidates using answer passage clustering for the conversational search task

ScholarWorks@UMass Amherst

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

Author: Chen Wei-Ming
Gan Chuang
Han Song
Hu Lanxiang
Lin Ji
Wang Wei-Chen
Zhu Ligeng
Publication venue
Publication date: 26/10/2023
Field of study

On-device learning and efficient fine-tuning enable continuous and privacy-preserving customization (e.g., locally fine-tuning large language models on personalized data). However, existing training frameworks are designed for cloud servers with powerful accelerators (e.g., GPUs, TPUs) and lack the optimizations for learning on the edge, which faces challenges of resource limitations and edge hardware diversity. We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices. PockEngine supports sparse backpropagation: it prunes the backward graph and sparsely updates the model with measured memory saving and latency reduction while maintaining the model quality. Secondly, PockEngine is compilation first: the entire training graph (including forward, backward and optimization steps) is derived at compile-time, which reduces the runtime overhead and brings opportunities for graph transformations. PockEngine also integrates a rich set of training graph optimizations, thus can further accelerate the training cost, including operator reordering and backend switching. PockEngine supports diverse applications, frontends and hardware backends: it flexibly compiles and tunes models defined in PyTorch/TensorFlow/Jax and deploys binaries to mobile CPU/GPU/DSPs. We evaluated PockEngine on both vision models and large language models. PockEngine achieves up to 15

\times

speedup over off-the-shelf TensorFlow (Raspberry Pi), 5.6

\times

memory saving back-propagation (Jetson AGX Orin). Remarkably, PockEngine enables fine-tuning LLaMav2-7B on NVIDIA Jetson AGX Orin at 550 tokens/s, 7.9

\times

faster than the PyTorch

arXiv.org e-Print Archive

S-KMN: Integrating Semantic Features Learning and Knowledge Mapping Network for Automatic Quiz Question Annotation

Author: Du Xu
Hung Jui-Long
Li Hao
Wang Jing
Yang Shuoqiu
Publication venue: 'IUScholarWorks'
Publication date: 01/07/2023
Field of study

Quiz question annotation aims to assign the most relevant knowledge point to a question, which is a key technology to support intelligent education applications. However, the existing methods only extract the explicit semantic information that reveals the literal meaning of a question, and ignore the implicit knowledge information that highlights the knowledge intention. To this end, an innovative dual-channel model, the Semantic-Knowledge Mapping Network (S-KMN) is proposed to enrich the question representation from two perspectives, semantic and knowledge, simultaneously. It integrates semantic features learning and knowledge mapping network (KMN) to extract explicit semantic features and implicit knowledge features of questions,respectively. Designing KMN to extract implicit knowledge features is the focus of this study. First, the context-aware and sequence information of knowledge attribute words in the question text is integrated into the knowledge attribute graph to form the knowledge representation of each question. Second, learning a projection matrix, which maps the knowledge representation to the latent knowledge space based on the scene base vectors, and the weighted summations of these base vectors serve as knowledge features. To enrich the question representation, an attention mechanism is introduced to fuse explicit semantic features and implicit knowledge features, which realizes further cognitive processing on the basis of understanding semantics. The experimental results on 19,410 real-world physics quiz questions in 30 knowledge points demonstrate that the S-KMN outperforms the state-of-the-art text classification-based question annotation method. Comprehensive analysis and ablation studies validate the superiority of our model in selecting knowledge-specific features

Boise State University - ScholarWorks