Retrieval Augmented Generation (RAG) has become prevalent in
question-answering (QA) tasks due to its ability of utilizing search engine to
enhance the quality of long-form question-answering (LFQA). Despite the
emergence of various open source methods and web-enhanced commercial systems
such as Bing Chat, two critical problems remain unsolved, i.e., the lack of
factuality and clear logic in the generated long-form answers. In this paper,
we remedy these issues via a systematic study on answer generation in
web-enhanced LFQA. Specifically, we first propose a novel outline-enhanced
generator to achieve clear logic in the generation of multifaceted answers and
construct two datasets accordingly. Then we propose a factuality optimization
method based on a carefully designed doubly fine-grained RLHF framework, which
contains automatic evaluation and reward modeling in different levels of
granularity. Our generic framework comprises conventional fine-grained RLHF
methods as special cases. Extensive experiments verify the superiority of our
proposed \textit{Factuality-optimized RAG (FoRAG)} method on both English and
Chinese benchmarks. In particular, when applying our method to Llama2-7B-chat,
the derived model FoRAG-L-7B outperforms WebGPT-175B in terms of three commonly
used metrics (i.e., coherence, helpfulness, and factuality), while the number
of parameters is much smaller (only 1/24 of that of WebGPT-175B). Our datasets
and models are made publicly available for better reproducibility:
https://huggingface.co/forag