Search CORE

13 research outputs found

Automated speech and audio analysis for semantic access to multimedia

Author: Huijbregts Marijn
Jong Franciska de
Ordelman Roeland
Publication venue: Springer Verlag
Publication date: 01/01/2006
Field of study

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives

University of Twente Research Information

The AMIDA 2009 Meeting Transcription System

Author: Burget Lukas
Dines John
El Hannani Asmaa
Garner Philip N.
Hain Thomas
Huijbregts Marijn
Karafiat Martin
Lincoln Mike
Wan Vincent
Publication venue
Publication date: 26/08/2010
Field of study

We present the AMIDA 2009 system for participation in the NIST RT’2009 STT evaluations. Systems for close-talking, far field and speaker attributed STT conditions are described. Im- provements to our previous systems are: segmentation and diar- isation; stacked bottle-neck posterior feature extraction; fMPE training of acoustic models; adaptation on complete meetings; improvements to WFST decoding; automatic optimisation of decoders and system graphs. Overall these changes gave a 6- 13% relative reduction in word error rate while at the same time reducing the real-time factor by a factor of five and using con- siderably less data for acoustic model training

Infoscience - École polytechnique fédérale de Lausanne

Visual recognition of gestures in a meeting to detect when documents being talked about are missing

Author: A Gupta
A Vinciarelli
A Yilmaz
G Tur
G Wilcock
GW Pulford
I Mccowan
J Cohen
JK Aggarwal
John R. Talburt
K Bernardin
O Benjelloun
RL Brennan
SO Ba
Publication venue: Internation Symposium on Ambient Intelligence ISAMI2018
Publication date: 01/01/2018
Field of study

Meetings frequently involve discussion of documents and can be significantly affected if a document is absent. An agent system capable of spontaneously retrieving a document at the point it is needed would have to judge whether a meeting is talking about a particular document and whether that document is already present. We report the exploratory application of agent techniques for making these two judgements. To obtain examples from which an agent system can learn, we first conducted a study of participants making these judgements with video recordings of meetings. We then show that interactions between hands and paper documents in meetings can be used to recognise when a document being talked about is not to hand. The work demonstrates the potential for multimodal agent systems using these techniques to learn to perform specific, discourse-level tasks during meetings

Crossref

UCL Discovery

An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

Author: Deepu Vijayasenan
Fabio Valente
Hervé Bourlard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recognition and Understanding of Meetings The AMI and AMIDA Projects

Author: Bourlard Hervé
Hain Thomas
Renals Steve
Publication venue: IDIAP
Publication date: 01/01/2007
Field of study

The AMI and AMIDA projects are concerned with the recognition and interpretation of multiparty meetings. Within these projects we have: developed an infrastructure for recording meetings using multiple microphones and cameras; released a 100 hour annotated corpus of meetings; developed techniques for the recognition and interpretation of meetings based primarily on speech recognition and computer vision; and developed an evaluation framework at both component and system levels. In this paper we present an overview of these projects, with an emphasis on speech recognition and content extraction

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Edinburgh Research Explorer

Multidisciplinary perspectives on automatic analysis of children's language samples : where do we go from here?

Author: Bornman Juan, 1968-
De Wet Febe
Ehlert Hanna
Heid Ulrich
Luedtke Ulrike
Ostermann Joern
Rumberg Lars
Van der Linde Jeannie
Publication venue: Karger
Publication date: 01/01/2023
Field of study

BACKGROUND : Language sample analysis (LSA) is invaluable to describe and understand child language use and development for clinical purposes and research. Digital tools supporting LSA are available, but many of the LSA steps have not been automated. Nevertheless, programs that include automatic speech recognition (ASR), the first step of LSA, have already reached mainstream applicability. SUMMARY : To better understand the complexity, challenges, and future needs of automatic LSA from a technological perspective, including the tasks of transcribing, annotating, and analysing natural child language samples, this article takes on a multidisciplinary view. Requirements of a fully automated LSA process are characterized, features of existing LSA software tools compared, and prior work from the disciplines of information science and computational linguistics reviewed. KEY MESSAGES : Existing tools vary in their extent of automation provided across the process of LSA. Advances in machine learning for speech recognition and processing have potential to facilitate LSA, but the specifics of child speech and language as well as the lack of child data complicate software design. A transdisciplinary approach is recommended as feasible to support future software development for LSA.https://karger.com/fplhj2023Centre for Augmentative and Alternative Communication (CAAC)Speech-Language Pathology and Audiolog

UPSpace at the University of Pretoria

Acoustic Beamforming for Speaker Diarization of Meetings

Author: Chuck Wooters
Javier Hernando
Xavier Anguera
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

An Information Theoretic Approach to Speaker Diarization of Meeting Recordings

Author: Vijayasenan Deepu
Publication venue: Lausanne, EPFL
Publication date: 30/09/2010
Field of study

In this thesis we investigate a non parametric approach to speaker diarization for meeting recordings based on an information theoretic framework. The problem is formulated using the Information Bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. The distance between speech segments is selected as the Jensen-Shannon divergence as it arises from the IB objective function optimization. In the first part of the thesis, we explore IB based diarization with Mel frequency cepstral coefficients (MFCC) as input features. We study issues related to IB based speaker diarization such as optimizing the IB objective function, criteria for inferring the number of speakers. Furthermore, we benchmark the proposed system against a state-of-the-art systemon the NIST RT06 (Rich Transcription) meeting data for speaker diarization. The IB based system achieves similar speaker error rates (16.8%) as compared to a baseline HMM/GMM system (17.0%). This approach being non parametric clustering, perform diarization six times faster than realtime while the baseline is slower than realtime. The second part of thesis proposes a novel feature combination system in the context of IB diarization. Both speaker clustering and speaker realignment steps are discussed. In contrary to current systems, the proposed method avoids the feature combination by averaging log-likelihood scores. Two different sets of features were considered – (a) combination of MFCC features with time delay of arrival features (b) a four feature stream combination that combines MFCC, TDOA, modulation spectrum and frequency domain linear prediction. Experiments show that the proposed system achieve 5% absolute improvement over the baseline in case of two feature combination, and 7% in case of four feature combination. The increase in algorithm complexity of the IB system is minimal with more features. The system with four feature input performs in real time that is ten times faster than the GMM based system

Infoscience - École polytechnique fédérale de Lausanne

Enhancing the front-end of speaker recognition systems

Author: Ahmed Ahmed Isam
Publication venue
Publication date: 01/07/2019
Field of study

Portsmouth University Research Portal (Pure)