167 research outputs found
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
The emergence of large language models (LLMs) further improves the
capabilities of open-domain dialogue systems and can generate fluent, coherent,
and diverse responses. However, LLMs still lack an important ability:
communication skills, which makes them more like information seeking tools than
anthropomorphic chatbots. To make LLMs more anthropomorphic and proactive
during the conversation, we add five communication skills to the response
generation process: topic transition, proactively asking questions, concept
guidance, empathy, and summarising often. The addition of communication skills
increases the interest of users in the conversation and attracts them to chat
for longer. To enable LLMs better understand and use communication skills, we
design and add the inner monologue to LLMs. The complete process is achieved
through prompt engineering and in-context learning. To evaluate communication
skills, we construct a benchmark named Cskills for evaluating various
communication skills, which can also more comprehensively evaluate the dialogue
generation ability of the model. Experimental results show that the proposed
CSIM strategy improves the backbone models and outperforms the baselines in
both automatic and human evaluations
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling
Retrieval-augmented language models show promise in addressing issues like
outdated information and hallucinations in language models (LMs). However,
current research faces two main problems: 1) determining what information to
retrieve, and 2) effectively combining retrieved information during generation.
We argue that valuable retrieved information should not only be related to the
current source text but also consider the future target text, given the nature
of LMs that model future tokens. Moreover, we propose that aggregation using
latent variables derived from a compact latent space is more efficient than
utilizing explicit raw text, which is limited by context length and susceptible
to noise. Therefore, we introduce RegaVAE, a retrieval-augmented language model
built upon the variational auto-encoder (VAE). It encodes the text corpus into
a latent space, capturing current and future information from both source and
target text. Additionally, we leverage the VAE to initialize the latent space
and adopt the probabilistic form of the retrieval generation paradigm by
expanding the Gaussian prior distribution into a Gaussian mixture distribution.
Theoretical analysis provides an optimizable upper bound for RegaVAE.
Experimental results on various datasets demonstrate significant improvements
in text generation quality and hallucination removal.Comment: Accepted to the Findings of EMNLP 202
Effects of metal film on transmission characteristics of single-dielectric-slab THz waveguide
The effects of a symmetrical metal film on the transmission characteristics of TM mode in the thicker single-dielectric-slab THz waveguide is analyzed theoretically. We find that the coating of metal film results in huge difference in the attenuation coefficients of TM mode, and it is increasing with respect to increase in the THz frequency. In case of a thicker single-dielectric-slab THz waveguide with low absorption loss, the influence of metal film on the loss of TM mode can not be ignored. We further study the influence of metal film on the mode field distribution of TM mode and we find that the mode field distribution of TM mode in the thicker dielectric slab is varied significantly after coating
Search-in-the-Chain: Towards the Accurate, Credible and Traceable Content Generation for Complex Knowledge-intensive Tasks
With the wide application of Large Language Models (LLMs) such as ChatGPT,
how to make the contents generated by LLM accurate and credible becomes very
important, especially in complex knowledge-intensive tasks. In this paper, we
propose a novel framework called Search-in-the-Chain (SearChain) to improve the
accuracy, credibility and traceability of LLM-generated content for multi-hop
question answering, which is a typical complex knowledge-intensive task.
SearChain is a framework that deeply integrates LLM and information retrieval
(IR). In SearChain, LLM constructs a chain-of-query, which is the decomposition
of the multi-hop question. Each node of the chain is a query-answer pair
consisting of an IR-oriented query and the answer generated by LLM for this
query. IR verifies, completes, and traces the information of each node of the
chain, so as to guide LLM to construct the correct chain-of-query, and finally
answer the multi-hop question. SearChain makes LLM change from trying to give a
answer to trying to construct the chain-of-query when faced with the multi-hop
question, which can stimulate the knowledge-reasoning ability and provides the
interface for IR to be deeply involved in reasoning process of LLM. IR
interacts with each node of chain-of-query of LLM. It verifies the information
of the node and provides the unknown knowledge to LLM, which ensures the
accuracy of the whole chain in the process of LLM generating the answer.
Besides, the contents returned by LLM to the user include not only the final
answer but also the reasoning process for the question, that is, the
chain-of-query and the supporting documents retrieved by IR for each node of
the chain, which improves the credibility and traceability of the contents
generated by LLM. Experimental results show SearChain outperforms related
baselines on four multi-hop question-answering datasets.Comment: work in progres
Broadband THz transmission within the symmetrical plastic film coated parallel-plate waveguide
We report the broadband THz transmission within the symmetrical plastic film coated parallel-plate waveguide. We theoretically study the antiresonant reflecting mechanism of the waveguide and we find that the broadband THz wave can transmit in this waveguide with ultra-low loss. The loss of TM mode in this waveguide can be 4 orders of magnitude lower than the uncoated parallel-plate waveguide. The transmission bandwidth of this waveguide is up to 5.12 THz. We further show the mode field distributions which explain the loss mechanism
- ā¦