Search CORE

116 research outputs found

An interactive and multi-level framework for summarising user generated videos

Author: Bredin Hervé
Cooray Saman H.
O'Connor Noel E.
Xu Li-Qun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

We present an interactive and multi-level abstraction framework for user-generated video (UGV) summarisation, allowing a user the flexibility to select a summarisation criterion out of a number of methods provided by the system. First, a given raw video is segmented into shots, and each shot is further decomposed into sub-shots in line with the change in dominant camera motion. Secondly, principal component analysis (PCA) is applied to the colour representation of the collection of sub-shots, and a content map is created using the first few components. Each sub-shot is represented with a ``footprint'' on the content map, which reveals its content significance (coverage) and the most dynamic segment. The final stage of abstraction is devised in a user-assisted manner whereby a user is able to specify a desired summary length, with options to interactively perform abstraction at different granularity of visual comprehension. The results obtained show the potential benefit in significantly alleviating the burden of laborious user intervention associated with conventional video editing/browsing

Crossref

Irish Universities

DCU Online Research Access Service

Dublin City University at the TRECVid 2008 BBC rushes summarisation task

Author: Bredin Hervé
Byrne Daragh
Jones Gareth J.F.
Lee Hyowon
O'Connor Noel E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

We describe the video summarisation systems submitted by Dublin City University to the TRECVid 2008 BBC Rushes Summarisation task. We introduce a new approach to re- dundant video summarisation based on principal component analysis and linear discriminant analysis. The resulting low dimensional representation of each shot offers a simple way to compare and select representative shots of the original video. The final summary is constructed as a dynamic sto- ryboard. Both types of summaries were evaluated and the results are discussed

Crossref

Irish Universities

DCU Online Research Access Service

Towards a better integration of written names for unsupervised speakers identification in videos

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceExisting methods for unsupervised identification of speakers in TV broadcast usually rely on the output of a speaker diariza- tion module and try to name each cluster using names provided by another source of information: we call it "late naming". Hence, written names extracted from title blocks tend to lead to high precision identification, although they cannot correct er- rors made during the clustering step. In this paper, we extend our previous "late naming" ap- proach in two ways: "integrated naming" and "early naming". While "late naming" relies on a speaker diarization module op- timized for speaker diarization, "integrated naming" jointly op- timize speaker diarization and name propagation in terms of identification errors. "Early naming" modifies the speaker di- arization module by adding constraints preventing two clusters with different written names to be merged together. While "integrated naming" yields similar identification per- formance as "late naming" (with better precision), "early nam- ing" improves over this baseline both in terms of identification error rate and stability of the clustering stopping criterion

Hal - Université Grenoble Alpes

LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

Author: Barras Claude
Bredin Hervé
Li Ming
Lin Qingjian
Yin Ruiqing
Publication venue: 'International Speech Communication Association'
Publication date: 23/07/2019
Field of study

More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this paper, we propose a supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM). Spectral clustering is applied on top of the similarity matrix to further improve the performance. Experimental results show that our system significantly outperforms the state-of-the-art methods and achieves a diarization error rate of 6.63% on the NIST SRE 2000 CALLHOME database.Comment: Accepted for INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

An open-source voice type classifier for child-centered daylong recordings

Author: Bousbib Ruben
Bredin Hervé
Cristia Alejandrina
Dupoux Emmanuel
Lavechin Marvin
Publication venue
Publication date: 25/10/2020
Field of study

Spontaneous conversations in real-world settings such as those found in child-centered recordings have been shown to be amongst the most challenging audio files to process. Nevertheless, building speech processing models handling such a wide variety of conditions would be particularly useful for language acquisition studies in which researchers are interested in the quantity and quality of the speech that children hear and produce, as well as for early diagnosis and measuring effects of remediation. In this paper, we present our approach to designing an open-source neural network to classify audio segments into vocalizations produced by the child wearing the recording device, vocalizations produced by other children, adult male speech, and adult female speech. To this end, we gathered diverse child-centered corpora which sums up to a total of 260 hours of recordings and covers 10 languages. Our model can be used as input for downstream tasks such as estimating the number of words produced by adult speakers, or the number of linguistic units produced by children. Our architecture combines SincNet filters with a stack of recurrent layers and outperforms by a large margin the state-of-the-art system, the Language ENvironment Analysis (LENA) that has been used in numerous child language studies.Comment: accepted to Interspeech 202

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Collaborative Annotation for Person Identification in TV Shows

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Bruneau Pierrick
Budnik Matheuz
Poignant Johann
Stefas Mickael
Tamisier Thomas
Publication venue: HAL CCSD
Publication date: 06/09/2015
Field of study

International audienceThis paper presents a collaborative annotation framework for person identification in TV shows. The web annotation front-end will be demonstrated during the Show and Tell session. All the code for annotation is made available on github. The tool can also be used in a crowd-sourcing environment

Hal - Université Grenoble Alpes

Hierarchical Late Fusion for Concept Detection in Videos

Author: Benoit Alexandre
Bredin Hervé
Lambert Patrick
Quénot Georges
Strat Tiberius
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Oral session 1: WS21 - Workshop on Information Fusion in Computer Vision for Concept RecognitionInternational audienceWe deal with the issue of combining dozens of classifiers into a better one, for concept detection in videos. We compare three fusion approaches that share a common structure: they all start with a classifier clustering stage, continue with an intra-cluster fusion and end with an inter-cluster fusion. The main difference between them comes from the first stage. The first approach relies on a priori knowledge about the internals of each classifier (low-level descriptors and classification algorithm) to group the set of available classifiers by similarity. The second and third approaches obtain classifier similarity measures directly from their output and group them using agglomerative clustering for the second approach and community detection for the third one

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Université de Savoie