7 research outputs found
Teaching Text-to-Image Models to Communicate in Dialog
A picture is worth a thousand words, thus, it is crucial for conversational
agents to understand, perceive, and effectively respond with pictures. However,
we find that directly employing conventional image generation techniques is
inadequate for conversational agents to produce image responses effectively. In
this paper, we focus on the innovative dialog-to-image generation task, where
the model synthesizes a high-resolution image aligned with the given dialog
context as a response. To tackle this problem, we design a tailored fine-tuning
approach on the top of state-of-the-art text-to-image generation models to
fully exploit the structural and semantic features in dialog context during
image generation. Concretely, we linearize the dialog context with specific
indicators to maintain the dialog structure, and employ in-domain data to
alleviate the style mismatch between dialog-to-image and conventional image
generation tasks. Empirical results on PhotoChat and MMDialog Corpus show that
our approach brings consistent and remarkable improvement with 3
state-of-the-art pre-trained text-to-image generation backbones.Comment: Work in progres
A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language Models
While large language models exhibit remarkable performance in the Question
Answering task, they are susceptible to hallucinations. Challenges arise when
these models grapple with understanding multi-hop relations in complex
questions or lack the necessary knowledge for a comprehensive response. To
address this issue, we introduce the "Decompose-and-Query" framework (D&Q).
This framework guides the model to think and utilize external knowledge similar
to ReAct, while also restricting its thinking to reliable information,
effectively mitigating the risk of hallucinations. Experiments confirm the
effectiveness of D&Q: On our ChitChatQA dataset, D&Q does not lose to ChatGPT
in 67% of cases; on the HotPotQA question-only setting, D&Q achieved an F1
score of 59.6%. Our code is available at
https://github.com/alkaidpku/DQ-ToolQA
Language Models can be Logical Solvers
Logical reasoning is a fundamental aspect of human intelligence and a key
component of tasks like problem-solving and decision-making. Recent
advancements have enabled Large Language Models (LLMs) to potentially exhibit
reasoning capabilities, but complex logical reasoning remains a challenge. The
state-of-the-art, solver-augmented language models, use LLMs to parse natural
language logical questions into symbolic representations first and then adopt
external logical solvers to take in the symbolic representations and output the
answers. Despite their impressive performance, any parsing errors will
inevitably result in the failure of the execution of the external logical
solver and no answer to the logical questions. In this paper, we introduce
LoGiPT, a novel language model that directly emulates the reasoning processes
of logical solvers and bypasses the parsing errors by learning to strict
adherence to solver syntax and grammar. LoGiPT is fine-tuned on a newly
constructed instruction-tuning dataset derived from revealing and refining the
invisible reasoning process of deductive solvers. Experimental results on two
public deductive reasoning datasets demonstrate that LoGiPT outperforms
state-of-the-art solver-augmented LMs and few-shot prompting methods on
competitive LLMs like ChatGPT or GPT-4.Comment: Preprin
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Training large language models (LLM) with open-domain instruction following
data brings colossal success. However, manually creating such instruction data
is very time-consuming and labor-intensive. Moreover, humans may struggle to
produce high-complexity instructions. In this paper, we show an avenue for
creating large amounts of instruction data with varying levels of complexity
using LLM instead of humans. Starting with an initial set of instructions, we
use our proposed Evol-Instruct to rewrite them step by step into more complex
instructions. Then, we mix all generated instruction data to fine-tune LLaMA.
We call the resulting model WizardLM. Human evaluations on a
complexity-balanced test bed show that instructions from Evol-Instruct are
superior to human-created ones. By analyzing the human evaluation results of
the high complexity part, we demonstrate that outputs from our WizardLM model
are preferred to outputs from OpenAI ChatGPT. Even though WizardLM still lags
behind ChatGPT in some aspects, our findings suggest that fine-tuning with
AI-evolved instructions is a promising direction for enhancing large language
models. Our codes and generated data are public at
https://github.com/nlpxucan/WizardLMComment: large language model, instruction fine-tun
Synergistic Interplay between Search and Large Language Models for Information Retrieval
Information retrieval (IR) plays a crucial role in locating relevant
resources from vast amounts of data, and its applications have evolved from
traditional knowledge bases to modern retrieval models (RMs). The emergence of
large language models (LLMs) has further revolutionized the IR field by
enabling users to interact with search systems in natural languages. In this
paper, we explore the advantages and disadvantages of LLMs and RMs,
highlighting their respective strengths in understanding user-issued queries
and retrieving up-to-date information. To leverage the benefits of both
paradigms while circumventing their limitations, we propose InteR, a novel
framework that facilitates information refinement through synergy between RMs
and LLMs. InteR allows RMs to expand knowledge in queries using LLM-generated
knowledge collections and enables LLMs to enhance prompt formulation using
retrieved documents. This iterative refinement process augments the inputs of
RMs and LLMs, leading to more accurate retrieval. Experiments on large-scale
retrieval benchmarks involving web search and low-resource retrieval tasks
demonstrate that InteR achieves overall superior zero-shot retrieval
performance compared to state-of-the-art methods, even those using relevance
judgment. Source code is available at https://github.com/Cyril-JZ/InteRComment: Pre-print. Work in progres
Topological defect and sp3/sp2 carbon interface derived from ZIF-8 with linker vacancies for oxygen reduction reaction
Defects in nanocarbon materials can trigger their intriguing electrochemical properties and potential applications, but their synthesis is challenging. Herein, we report the synthesis of ultrathin nitrogen-doped carbon nanosheets with intrinsic defects through the pyrolysis of ZIF-8 with linker vacancies. The as-synthesized electrocatalyst exhibits excellent oxygen reduction reaction (ORR) activity with an onset potential and half-wave potential of 1.05 and 0.873 V vs. RHE, respectively, outperforming the reported metal-free ORR electrocatalysts. It also shows a commercial Pt/C-comparable performance in zinc–air battery with a power density of 154.4 mW cm−2. Characterization and DFT calculation results suggest the adjacent sp3-carbon in carbon pentagon can significantly strengthen the adsorption and activation of oxygen molecules on sp2-carbon, hence the potential determining step is altered and ORR overpotential is lowered. This work highlights a promising green synthesis strategy of MOF-derived metal-free nanocarbon materials for wide application in advanced energy technologies