Evaluation has always been a key challenge in the development of artificial
intelligence (AI) based software, due to the technical complexity of the
software artifact and, often, its embedding in complex sociotechnical
processes. Recent advances in machine learning (ML) enabled by deep neural
networks has exacerbated the challenge of evaluating such software due to the
opaque nature of these ML-based artifacts. A key related issue is the
(in)ability of such systems to generate useful explanations of their outputs,
and we argue that the explanation and evaluation problems are closely linked.
The paper models the elements of a ML-based AI system in the context of public
sector decision (PSD) applications involving both artificial and human
intelligence, and maps these elements against issues in both evaluation and
explanation, showing how the two are related. We consider a number of common
PSD application patterns in the light of our model, and identify a set of key
issues connected to explanation and evaluation in each case. Finally, we
propose multiple strategies to promote wider adoption of AI/ML technologies in
PSD, where each is distinguished by a focus on different elements of our model,
allowing PSD policy makers to adopt an approach that best fits their context
and concerns.Comment: Presented at AAAI FSS-18: Artificial Intelligence in Government and
Public Sector, Arlington, Virginia, USA; corrected typos in this versio