Testing is acknowledged as indispensible support for scientific
software development and assurance of software quality to produce
trustworthy simulation results. Most of the time, testing in software
frameworks developed at research facilities is restricted to either unit
testing or simple benchmark programs. However, in a modern numerical
software framework, such as deal.II, FEniCS, or Dune, the number of
possible feature combinations constituting a program is vast. Only
system testing, meaning testing within a possible end user environment
also emulating variability, can assess software quality and
reproducibility of numerical results. We discuss tools to define system
tests including both runtime and compile time variation. We furthermore
discuss implementation of quality measures tailored to numerical
frameworks for the solution of PDEs. We will also share experiences on
using continuous integration systems (GitLab CI) for numerical software
frameworks.<br><br>Poster presented at SIAM CSE17 PP108 Minisymposterium: Software Productivity and Sustainability for CSE and Data Science<br