In this paper, we propose to learn a Mahalanobis distance to perform
alignment of multivariate time series. The learning examples for this task are
time series for which the true alignment is known. We cast the alignment
problem as a structured prediction task, and propose realistic losses between
alignments for which the optimization is tractable. We provide experiments on
real data in the audio to audio context, where we show that the learning of a
similarity measure leads to improvements in the performance of the alignment
task. We also propose to use this metric learning framework to perform feature
selection and, from basic audio features, build a combination of these with
better performance for the alignment