The goal of this paper is to interactively refine the automatic segmentation
on challenging structures that fall behind human performance, either due to the
scarcity of available annotations or the difficulty nature of the problem
itself, for example, on segmenting cancer or small organs. Specifically, we
propose a novel Transformer-based architecture for Interactive Segmentation
(TIS), that treats the refinement task as a procedure for grouping pixels with
similar features to those clicks given by the end users. Our proposed
architecture is composed of Transformer Decoder variants, which naturally
fulfills feature comparison with the attention mechanisms. In contrast to
existing approaches, our proposed TIS is not limited to binary segmentations,
and allows the user to edit masks for arbitrary number of categories. To
validate the proposed approach, we conduct extensive experiments on three
challenging datasets and demonstrate superior performance over the existing
state-of-the-art methods. The project page is: https://wtliu7.github.io/tis/.Comment: Accepted to MICCAI 202