666 research outputs found
Improving Sequential Determinantal Point Processes for Supervised Video Summarization
It is now much easier than ever before to produce videos. While the
ubiquitous video data is a great source for information discovery and
extraction, the computational challenges are unparalleled. Automatically
summarizing the videos has become a substantial need for browsing, searching,
and indexing visual content. This paper is in the vein of supervised video
summarization using sequential determinantal point process (SeqDPP), which
models diversity by a probabilistic distribution. We improve this model in two
folds. In terms of learning, we propose a large-margin algorithm to address the
exposure bias problem in SeqDPP. In terms of modeling, we design a new
probabilistic distribution such that, when it is integrated into SeqDPP, the
resulting model accepts user input about the expected length of the summary.
Moreover, we also significantly extend a popular video summarization dataset by
1) more egocentric videos, 2) dense user annotations, and 3) a refined
evaluation scheme. We conduct extensive experiments on this dataset (about 60
hours of videos in total) and compare our approach to several competitive
baselines
Towards End-to-end Speech-to-text Summarization
Speech-to-text (S2T) summarization is a time-saving technique for filtering
and keeping up with the broadcast news uploaded online on a daily basis. The
rise of large language models from deep learning with impressive text
generation capabilities has placed the research focus on summarization systems
that produce paraphrased compact versions of the document content, also known
as abstractive summaries. End-to-end (E2E) modelling of S2T abstractive
summarization is a promising approach that offers the possibility of generating
rich latent representations that leverage non-verbal and acoustic information,
as opposed to the use of only linguistic information from automatically
generated transcripts in cascade systems. However, the few literature on E2E
modelling of this task fails on exploring different domains, namely broadcast
news, which is challenging domain where large and diversified volumes of data
are presented to the user every day. We model S2T summarization both with a
cascade and an E2E system for a corpus of broadcast news in French. Our novel
E2E model leverages external data by resorting to transfer learning from a
pre-trained T2T summarizer. Experiments show that both our cascade and E2E
abstractive summarizers are stronger than an extractive baseline. However, the
performance of the E2E model still lies behind the cascade one, which is object
of an extensive analysis that includes future directions to close that gap.Comment: Accepted to the 26th International Conference of Text, Speech and
Dialogue (TSD2023
- …