Automatic summarization generation of sports video content has been object of
great interest for many years. Although semantic descriptions techniques have
been proposed, many of the approaches still rely on low-level video descriptors
that render quite limited results due to the complexity of the problem and to
the low capability of the descriptors to represent semantic content. In this
paper, a new approach for automatic highlights summarization generation of
soccer videos using audio-visual descriptors is presented. The approach is
based on the segmentation of the video sequence into shots that will be further
analyzed to determine its relevance and interest. Of special interest in the
approach is the use of the audio information that provides additional
robustness to the overall performance of the summarization system. For every
video shot a set of low and mid level audio-visual descriptors are computed and
lately adequately combined in order to obtain different relevance measures
based on empirical knowledge rules. The final summary is generated by selecting
those shots with highest interest according to the specifications of the user
and the results of relevance measures. A variety of results are presented with
real soccer video sequences that prove the validity of the approach