54,641 research outputs found
Automatic Decision Detection in Meeting Speech
Decision making is an important aspect of meetings in organisational settings, and archives of meeting recordings constitute a valuable source of information about the decisions made. However, standard utilities such as playback and keyword search are not sufficient for locating decision points from meeting archives. In this paper, we present the AMI DecisionDetector, a system that automatically detects and highlights where the decision-related conversations are. In this paper, we apply the models developed in our previous work [1], which detects decision-related dialogue acts (DAs) from parts of the transcripts that have been manually annotated as extract-worthy, to the task of detecting decision-related DAs and topic segments directly from complete transcripts. Results show that we need to combine features extracted from multiple knowledge sources (e.g., lexical, prosodic, DA-related, and topical class) in order to yield the model with the highest precision. We have provided a quantitative account of the feature class effects. As our ultimate goal is to operate AMI DecisionDetector in a fully automatic fashion, we also investigate the impacts of using automatically generated features, for example, the 5-class DA features obtained in [2]
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for
the automatic segmentation of speech into topically coherent units. We propose
two methods for combining lexical and prosodic information using hidden Markov
models and decision trees. Lexical information is obtained from a speech
recognizer, and prosodic features are extracted automatically from speech
waveforms. We evaluate our approach on the Broadcast News corpus, using the
DARPA-TDT evaluation metrics. Results show that the prosodic model alone is
competitive with word-based segmentation methods. Furthermore, we achieve a
significant reduction in error by combining the prosodic and word-based
knowledge sources.Comment: 27 pages, 8 figure
Multi-party Interaction in a Virtual Meeting Room
This paper presents an overview of the work carried out at the HMI group of the University of Twente in the domain of multi-party interaction. The process from automatic observations of behavioral aspects through interpretations resulting in recognized behavior is discussed for various modalities and levels. We show how a virtual meeting room can be used for visualization and evaluation of behavioral models as well as a research tool for studying the effect of modified stimuli on the perception of behavior
Speech and crosstalk detection in multichannel audio
The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation
Predicting continuous conflict perception with Bayesian Gaussian processes
Conflict is one of the most important phenomena of social life, but it is still largely neglected by the computing community. This work proposes an approach
that detects common conversational social signals (loudness, overlapping speech,
etc.) and predicts the conflict level perceived by human observers in continuous,
non-categorical terms. The proposed regression approach is fully Bayesian and it
adopts Automatic Relevance Determination to identify the social signals that influence most the outcome of the prediction. The experiments are performed over the SSPNet Conflict Corpus, a publicly available collection of 1430 clips extracted from televised political debates (roughly 12 hours of material for 138 subjects in total). The results show that it is possible to achieve a correlation close to 0.8 between actual and predicted conflict perception
Automatic Segmentation of Multiparty Dialogue
In this paper, we investigate the problem of automatically predicting segment boundaries in spoken multiparty dialogue. We extend prior work in two ways. We first apply approaches that have been proposed for predicting top-level topic shifts to the problem of identifying subtopic boundaries. We then explore the impact on performance of using ASR output as opposed to human transcription. Examination of the effect of features shows that predicting top-level and predicting subtopic boundaries are two distinct tasks: (1) for predicting subtopic boundaries, the lexical cohesion-based approach alone can achieve competitive results, (2) for predicting top-level boundaries, the machine learning approach that combines lexical-cohesion and conversational features performs best, and (3) conversational cues, such as cue phrases and overlapping speech, are better indicators for the top-level prediction task. We also find that the transcription errors inevitable in ASR output have a negative impact on models that combine lexical-cohesion and conversational features, but do not change the general preference of approach for the two tasks
- …