127 research outputs found
Adapting Sequence Models for Sentence Correction
In a controlled experiment of sequence-to-sequence approaches for the task of
sentence correction, we find that character-based models are generally more
effective than word-based models and models that encode subword information via
convolutions, and that modeling the output data as a series of diffs improves
effectiveness over standard approaches. Our strongest sequence-to-sequence
model improves over our strongest phrase-based statistical machine translation
model, with access to the same data, by 6 M2 (0.5 GLEU) points. Additionally,
in the data environment of the standard CoNLL-2014 setup, we demonstrate that
modeling (and tuning against) diffs yields similar or better M2 scores with
simpler models and/or significantly less data than previous
sequence-to-sequence approaches.Comment: EMNLP 201
Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance
Retrieval augmented models show promise in enhancing traditional language
models by improving their contextual understanding, integrating private data,
and reducing hallucination. However, the processing time required for retrieval
augmented large language models poses a challenge when applying them to tasks
that require real-time responses, such as composition assistance.
To overcome this limitation, we propose the Hybrid Retrieval-Augmented
Generation (HybridRAG) framework that leverages a hybrid setting that combines
both client and cloud models. HybridRAG incorporates retrieval-augmented memory
generated asynchronously by a Large Language Model (LLM) in the cloud. By
integrating this retrieval augmented memory, the client model acquires the
capability to generate highly effective responses, benefiting from the LLM's
capabilities. Furthermore, through asynchronous memory integration, the client
model is capable of delivering real-time responses to user requests without the
need to wait for memory synchronization from the cloud. Our experiments on
Wikitext and Pile subsets show that HybridRAG achieves lower latency than a
cloud-based retrieval-augmented LLM, while outperforming client-only models in
utility
- …