Large Language Models (LLMs) have demonstrated remarkable performance across
various natural language tasks, marking significant strides towards general
artificial intelligence. While general artificial intelligence is leveraged by
developing increasingly large-scale models, there could be another branch to
develop lightweight custom models that better serve certain domains, taking
into account the high cost of training and deploying LLMs and the scarcity of
resources. In this paper, we present MindLLM, a novel series of bilingual
lightweight large language models, trained from scratch, alleviating such
burdens by offering models with 1.3 billion and 3 billion parameters. A
thorough account of experiences accrued during large model development is
given, covering every step of the process, including data construction, model
architecture, evaluation, and applications. Such insights are hopefully
valuable for fellow academics and developers. MindLLM consistently matches or
surpasses the performance of other open-source larger models on some public
benchmarks. We also introduce an innovative instruction tuning framework
tailored for smaller models to enhance their capabilities efficiently.
Moreover, we explore the application of MindLLM in specific vertical domains
such as law and finance, underscoring the agility and adaptability of our
lightweight models.Comment: Working in progres