<p>Water quality in lakes can be understood through the lens of metabolism, the balance of primary production and respiration. The challenge of understanding lake metabolism at regional to continental scales is due in part to sparse data availability and a lack of knowledge regarding factors controlling processes at broad scales. To address issues with the scalability of contemporary lake metabolism models, we are leveraging Ecology Knowledge-guided Machine Learning (Eco-KGML). Within the Eco-KGML paradigm, the Modular Compositional Learning (MCL) framework enables the segmentation of a model into smaller modules that can be either process-based or machine learning. Different combinations of modules can be tested to create the most effective predictor of a target variable, which in the case of our metabolism model is water quality. MCL metabolism models can be trained on well-studied systems, then applied to lakes with sparse data. While our MCL metabolism model is still in early development, examples of MCL for simulating lake physics have shown better prediction skill than purely process-based or purely machine learning models. The integration of machine learning into ecological modeling is a novel concept that is made possible only by the ecological insights and unique data collected by the North Temperate Lakes Long-Term Ecological Research (NTL-LTER) program and the National Ecological Observatory Network (NEON).</p>