Methods were developed to assess and quantify the predictive quality of simulation models, with the intent to contribute to evaluation of model studies by non-scientists. In a case study, two models of different complexity, LINTUL and SUCROS87, were used to predict yield of forage maize under Dutch meteorological conditions. The models predict yield under potential conditions, i.e. temperature- and radiation limited yield, assuming other production factors to be optimal.After a review of concerns voiced in model-based applied research, the simulation models were described in a systematic manner to simplify access to the software code. A model analysis showed that the models contain switches, describing abrupt changes occurring in the crop (e.g. change of temperature driven leaf area growth to photosynthesis driven leaf growth; onset of leaf senescence). Some switches introduced discontinuities in the relation between state variables and parameters. Such properties make non-standard approaches for parameter estimation necessary.Subsequently, the empirical basis of the simulation model was reviewed in terms of parameter values and their uncertainty, as derived from literature. The results were used to evaluate the predictive quality given the parameter uncertainty. Predictive quality given the parameter uncertainty was low; parameter estimation to adapt the model to local conditions was necessary.Different procedures to calibrate the models were discussed and presented. For the combination of models and the data available in this case study, parameters had to be selected. Selection was based on the ranking of the parameters on the basis of their contribution to output uncertainty. Non-selected parameters were fixed at their default value. Calibration using a controlled random search algorithm for a point estimation procedure was executed for both models. In the estimation procedure a compromise was sought between different types of problems: estimation bias, parameter identifiability and local minima.The parameter estimates were used to generate predictions. A comparison between predictions and measured data was used to evaluate the predictive quality of the models in terms that are relevant for the application. To do so, the concept of a link hypothesis was introduced. It defines the anticipated relation between prediction and measurement. Deviations from the anticipated relation were used to quantify predictive quality. Predictive quality was shown to depend strongly on the procedure used to generate predictions, i.e. procedures combining results based on multiple calibration sets yielded better predictions than predictions based on a single data set.To translate predictive quality in terms of usefulness of the simulation model prediction errors were compared to those of benchmark predictors (simple statistical predictors). LINTUL and SUCROS87 differed in their performance in relation to the benchmark predictors.Procedures developed in this thesis suggested that facilitating model evaluation requires actions that are not easily executed within the context of project-based, often time-limited, applied research. Investment in the methodological basis and in the empirical basis of the models prior to their application will be required.</p