We present TinyLLaVA Factory, an open-source modular codebase for small-scale
large multimodal models (LMMs) with a focus on simplicity of code
implementations, extensibility of new features, and reproducibility of training
results. Following the design philosophy of the factory pattern in software
engineering, TinyLLaVA Factory modularizes the entire system into
interchangeable components, with each component integrating a suite of
cutting-edge models and methods, meanwhile leaving room for extensions to more
features. In addition to allowing users to customize their own LMMs, TinyLLaVA
Factory provides popular training recipes to let users pretrain and finetune
their models with less coding effort. Empirical experiments validate the
effectiveness of our codebase. The goal of TinyLLaVA Factory is to assist
researchers and practitioners in exploring the wide landscape of designing and
training small-scale LMMs with affordable computational resources.Comment: Our codebase is made public at
https://github.com/TinyLLaVA/TinyLLaVA_Factory with documentation available
at https://tinyllava-factory.readthedocs.io/en/latest