Large language models (LLMs) have gained enormous attention from both
academia and industry, due to their exceptional ability in language generation
and extremely powerful generalization. However, current LLMs still output
unreliable content in practical reasoning tasks due to their inherent issues
(e.g., hallucination). To better disentangle this problem, in this paper, we
conduct an in-depth investigation to systematically explore the capability of
LLMs in logical reasoning. More in detail, we first investigate the deficiency
of LLMs in logical reasoning on different tasks, including event relation
extraction and deductive reasoning. Our study demonstrates that LLMs are not
good reasoners in solving tasks with rigorous reasoning and will produce
counterfactual answers, which require us to iteratively refine. Therefore, we
comprehensively explore different strategies to endow LLMs with logical
reasoning ability, and thus enable them to generate more logically consistent
answers across different scenarios. Based on our approach, we also contribute a
synthesized dataset (LLM-LR) involving multi-hop reasoning for evaluation and
pre-training. Extensive quantitative and qualitative analyses on different
tasks also validate the effectiveness and necessity of teaching LLMs with logic
and provide insights for solving practical tasks with LLMs in future work