The piano cover of pop music is widely enjoyed by people. However, the
generation task of the pop piano cover is still understudied. This is partly
due to the lack of synchronized {Pop, Piano Cover} data pairs, which made it
challenging to apply the latest data-intensive deep learning-based methods. To
leverage the power of the data-driven approach, we make a large amount of
paired and synchronized {pop, piano cover} data using an automated pipeline. In
this paper, we present Pop2Piano, a Transformer network that generates piano
covers given waveforms of pop music. To the best of our knowledge, this is the
first model to directly generate a piano cover from pop audio without melody
and chord extraction modules. We show that Pop2Piano trained with our dataset
can generate plausible piano covers