37 research outputs found

    Combining contextualized and non-contextualized embeddings for domain adaptation and beyond

    Get PDF

    An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

    Get PDF
    Access to external knowledge is essential for many natural language processing tasks, such as question answering and dialogue. Existing methods often rely on a parametric model that stores knowledge in its parameters, or use a retrieval-augmented model that has access to an external knowledge source. Parametric and retrieval-augmented models have complementary strengths in terms of computational efficiency and predictive accuracy. To combine the strength of both approaches, we propose the Efficient Memory-Augmented Transformer (EMAT) – it encodes external knowledge into a key-value memory and exploits the fast maximum inner product search for memory querying. We also introduce pre-training tasks that allow EMAT to encode informative key-value representations, and to learn an implicit strategy to integrate multiple memory slots into the transformer. Experiments on various knowledge-intensive tasks such as question answering and dialogue datasets show that, simply augmenting parametric models (T5-base) using our method produces more accurate results (e.g., 25.8 → 44.3 EM on NQ) while retaining a high throughput (e.g., 1000 queries/s on NQ). Compared to retrievalaugmented models, EMAT runs substantially faster across the board and produces more accurate results on WoW and ELI5.

    Entity centric neural models for natural language processing

    Get PDF
    This thesis explores how to enhance natural language understanding by incorporating entity information into neural network models. It tackles three key questions:1. Leveraging entities for understanding tasks: This work introduces Entity-GCN, a model that performs multi-step reasoning on a graph where nodes represent entity mentions and edges represent relationships. This method achieved state-of-the-art results on a multi-document question-answering dataset.2. Identifying and disambiguating entities using large language models: This research proposes a novel system that retrieves entities by generating their names token-by-token, overcoming limitations of traditional methods and significantly reducing memory footprint. This approach is also extended to a multilingual setting and further optimized for efficiency.3. Interpreting and controlling entity knowledge within models: This thesis presents a post-hoc interpretation technique to analyze how decisions are made across layers in neural models, allowing for visualization and analysis of knowledge representation. Additionally, a method for editing factual knowledge about entities is proposed, enabling correction of model predictions without costly retraining

    Entity centric neural models for natural language processing

    Get PDF
    This thesis explores how to enhance natural language understanding by incorporating entity information into neural network models. It tackles three key questions:1. Leveraging entities for understanding tasks: This work introduces Entity-GCN, a model that performs multi-step reasoning on a graph where nodes represent entity mentions and edges represent relationships. This method achieved state-of-the-art results on a multi-document question-answering dataset.2. Identifying and disambiguating entities using large language models: This research proposes a novel system that retrieves entities by generating their names token-by-token, overcoming limitations of traditional methods and significantly reducing memory footprint. This approach is also extended to a multilingual setting and further optimized for efficiency.3. Interpreting and controlling entity knowledge within models: This thesis presents a post-hoc interpretation technique to analyze how decisions are made across layers in neural models, allowing for visualization and analysis of knowledge representation. Additionally, a method for editing factual knowledge about entities is proposed, enabling correction of model predictions without costly retraining

    PockEngine: Sparse and Efficient Fine-tuning in a Pocket

    Full text link
    On-device learning and efficient fine-tuning enable continuous and privacy-preserving customization (e.g., locally fine-tuning large language models on personalized data). However, existing training frameworks are designed for cloud servers with powerful accelerators (e.g., GPUs, TPUs) and lack the optimizations for learning on the edge, which faces challenges of resource limitations and edge hardware diversity. We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices. PockEngine supports sparse backpropagation: it prunes the backward graph and sparsely updates the model with measured memory saving and latency reduction while maintaining the model quality. Secondly, PockEngine is compilation first: the entire training graph (including forward, backward and optimization steps) is derived at compile-time, which reduces the runtime overhead and brings opportunities for graph transformations. PockEngine also integrates a rich set of training graph optimizations, thus can further accelerate the training cost, including operator reordering and backend switching. PockEngine supports diverse applications, frontends and hardware backends: it flexibly compiles and tunes models defined in PyTorch/TensorFlow/Jax and deploys binaries to mobile CPU/GPU/DSPs. We evaluated PockEngine on both vision models and large language models. PockEngine achieves up to 15 Ă—\times speedup over off-the-shelf TensorFlow (Raspberry Pi), 5.6 Ă—\times memory saving back-propagation (Jetson AGX Orin). Remarkably, PockEngine enables fine-tuning LLaMav2-7B on NVIDIA Jetson AGX Orin at 550 tokens/s, 7.9Ă—\times faster than the PyTorch

    S-KMN: Integrating Semantic Features Learning and Knowledge Mapping Network for Automatic Quiz Question Annotation

    Get PDF
    Quiz question annotation aims to assign the most relevant knowledge point to a question, which is a key technology to support intelligent education applications. However, the existing methods only extract the explicit semantic information that reveals the literal meaning of a question, and ignore the implicit knowledge information that highlights the knowledge intention. To this end, an innovative dual-channel model, the Semantic-Knowledge Mapping Network (S-KMN) is proposed to enrich the question representation from two perspectives, semantic and knowledge, simultaneously. It integrates semantic features learning and knowledge mapping network (KMN) to extract explicit semantic features and implicit knowledge features of questions,respectively. Designing KMN to extract implicit knowledge features is the focus of this study. First, the context-aware and sequence information of knowledge attribute words in the question text is integrated into the knowledge attribute graph to form the knowledge representation of each question. Second, learning a projection matrix, which maps the knowledge representation to the latent knowledge space based on the scene base vectors, and the weighted summations of these base vectors serve as knowledge features. To enrich the question representation, an attention mechanism is introduced to fuse explicit semantic features and implicit knowledge features, which realizes further cognitive processing on the basis of understanding semantics. The experimental results on 19,410 real-world physics quiz questions in 30 knowledge points demonstrate that the S-KMN outperforms the state-of-the-art text classification-based question annotation method. Comprehensive analysis and ablation studies validate the superiority of our model in selecting knowledge-specific features
    corecore