Large language models (LLMs) have demonstrated substantial commonsense
understanding through numerous benchmark evaluations. However, their
understanding of cultural commonsense remains largely unexamined. In this
paper, we conduct a comprehensive examination of the capabilities and
limitations of several state-of-the-art LLMs in the context of cultural
commonsense tasks. Using several general and cultural commonsense benchmarks,
we find that (1) LLMs have a significant discrepancy in performance when tested
on culture-specific commonsense knowledge for different cultures; (2) LLMs'
general commonsense capability is affected by cultural context; and (3) The
language used to query the LLMs can impact their performance on
cultural-related tasks. Our study points to the inherent bias in the cultural
understanding of LLMs and provides insights that can help develop culturally
aware language models