Automated melodic phrase detection and segmentation is a classical task in
content-based music information retrieval and also the key towards automated
music structure analysis. However, traditional methods still cannot satisfy
practical requirements. In this paper, we explore and adapt various neural
network architectures to see if they can be generalized to work with the
symbolic representation of music and produce satisfactory melodic phrase
segmentation. The main issue of applying deep-learning methods to phrase
detection is the sparse labeling problem of training sets. We proposed two
tailored label engineering with corresponding training techniques for different
neural networks in order to make decisions at a sequential level. Experiment
results show that the CNN-CRF architecture performs the best, being able to
offer finer segmentation and faster to train, while CNN, Bi-LSTM-CNN and
Bi-LSTM-CRF are acceptable alternatives