Data building for automatic post-editing (APE) requires extensive and
expert-level human effort, as it contains an elaborate process that involves
identifying errors in sentences and providing suitable revisions. Hence, we
develop a self-supervised data generation tool, deployable as a web
application, that minimizes human supervision and constructs personalized APE
data from a parallel corpus for several language pairs with English as the
target language. Data-centric APE research can be conducted using this tool,
involving many language pairs that have not been studied thus far owing to the
lack of suitable data.Comment: Accepted for DataPerf workshop at ICML 202