Large language models (LLM) are generating information at a rapid pace,
requiring users to increasingly rely and trust the data. Despite remarkable
advances of LLM, Information generated by LLM is not completely trustworthy,
due to challenges in information quality. Specifically, integrity of
Information quality decreases due to unreliable, biased, tokenization during
pre-training of LLM. Moreover, due to decreased information quality issues, has
led towards hallucination, fabricated information. Unreliable information can
lead towards flawed decisions in businesses, which impacts economic activity.
In this work, we introduce novel mathematical information quality evaluation of
LLM, we furthermore analyze and highlight information quality challenges,
scaling laws to systematically scale language models.Comment: 31 page