Sound event localization and detection is a novel area of research that
emerged from the combined interest of analyzing the acoustic scene in terms of
the spatial and temporal activity of sounds of interest. This paper presents an
overview of the first international evaluation on sound event localization and
detection, organized as a task of the DCASE 2019 Challenge. A large-scale
realistic dataset of spatialized sound events was generated for the challenge,
to be used for training of learning-based approaches, and for evaluation of the
submissions in an unlabeled subset. The overview presents in detail how the
systems were evaluated and ranked and the characteristics of the
best-performing systems. Common strategies in terms of input features, model
architectures, training approaches, exploitation of prior knowledge, and data
augmentation are discussed. Since ranking in the challenge was based on
individually evaluating localization and event classification performance, part
of the overview focuses on presenting metrics for the joint measurement of the
two, together with a reevaluation of submissions using these new metrics. The
new analysis reveals submissions that performed better on the joint task of
detecting the correct type of event close to its original location than some of
the submissions that were ranked higher in the challenge. Consequently, ranking
of submissions which performed strongly when evaluated separately on detection
or localization, but not jointly on both, was affected negatively