Music Structure Analysis (MSA) is the task aiming at identifying musical
segments that compose a music track and possibly label them based on their
similarity. In this paper we propose a supervised approach for the task of
music boundary detection. In our approach we simultaneously learn features and
convolution kernels. For this we jointly optimize -- a loss based on the
Self-Similarity-Matrix (SSM) obtained with the learned features, denoted by
SSM-loss, and -- a loss based on the novelty score obtained applying the
learned kernels to the estimated SSM, denoted by novelty-loss. We also
demonstrate that relative feature learning, through self-attention, is
beneficial for the task of MSA. Finally, we compare the performances of our
approach to previously proposed approaches on the standard RWC-Pop, and various
subsets of SALAMI