96 research outputs found
Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests
Multiple Instance Learning (MIL) is a sub-domain of classification problems
with positive and negative labels and a "bag" of inputs, where the label is
positive if and only if a positive element is contained within the bag, and
otherwise is negative. Training in this context requires associating the
bag-wide label to instance-level information, and implicitly contains a causal
assumption and asymmetry to the task (i.e., you can't swap the labels without
changing the semantics). MIL problems occur in healthcare (one malignant cell
indicates cancer), cyber security (one malicious executable makes an infected
computer), and many other tasks. In this work, we examine five of the most
prominent deep-MIL models and find that none of them respects the standard MIL
assumption. They are able to learn anti-correlated instances, i.e., defaulting
to "positive" labels until seeing a negative counter-example, which should not
be possible for a correct MIL model. We suspect that enhancements and other
works derived from these models will share the same issue. In any context in
which these models are being used, this creates the potential for learning
incorrect models, which creates risk of operational failure. We identify and
demonstrate this problem via a proposed "algorithmic unit test", where we
create synthetic datasets that can be solved by a MIL respecting model, and
which clearly reveal learning that violates MIL assumptions. The five evaluated
methods each fail one or more of these tests. This provides a model-agnostic
way to identify violations of modeling assumptions, which we hope will be
useful for future development and evaluation of MIL models.Comment: To appear in the 37th Conference on Neural Information Processing
Systems (NeurIPS 2023
- …