57 research outputs found
Robust Multi-Person Tracking from Moving Platforms
In this paper, we address the problem of multi-person tracking in busy pedestrian
zones, using a stereo rig mounted on a mobile platform. The
complexity of the problem calls for an integrated solution, which
extracts as much visual information as possible and combines it
through cognitive feedback. We propose such an approach, which
jointly estimates camera position, stereo depth, object detection,
and tracking. We model the interplay between these components
using a graphical model. Since the model has to
incorporate object-object interactions, and temporal links to past
frames, direct inference is intractable. We therefore propose a two-stage
procedure: for each frame we first solve a simplified version of the
model (disregarding interactions and temporal continuity) to
estimate the scene geometry and an overcomplete set of object
detections. Conditioned on these results, we then address object
interactions, tracking, and prediction in a second step. The
approach is experimentally evaluated on several long and difficult
video sequences from busy inner-city locations. Our results show
that the proposed integration makes it possible to deliver stable
tracking performance in scenes of realistic complexity
LARP LHC 4.8 GHZ Schottky System Initial Commissioning with Beam
The LHC Schottky system consists for four independent 4.8 GHz triple down
conversion receivers with associated data acquisition systems. Each system is
capable of measuring tune, chromaticity, momentum spread in either horizontal
or vertical planes; two systems per beam. The hardware commissioning has taken
place from spring through fall of 2010. With nominal bunch beam currents of
1011 protons, the first incoherent Schottky signals were detected and analyzed.
This paper will report on these initial commissioning results. A companion
paper will report on the data analysis curve fitting and remote control user
interface of the system.Comment: 3 pp. Particle Accelerator, 24th Conference (PAC'11) 2011. 28 Mar - 1
Apr 2011. New York, US
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Recommended from our members
Importance of size representation and morphology in modelling optical properties of black carbon: comparison between laboratory measurements and model simulations
Black carbon (BC) from incomplete combustion of biomass or fossil fuels is the strongest absorbing aerosol component in the atmosphere. Optical properties of BC are essential in climate models for quantification of their impact on radiative forcing. The global climate models, however, consider BC to be spherical particles, which causes uncertainties in their optical properties. Based on this, an increasing number of model-based studies provide databases and parameterization schemes for the optical properties of BC, using more realistic fractal aggregate morphologies. In this study, the reliability of the different modelling techniques of BC was investigated by comparing them to laboratory measurements. The modelling techniques were examined for bare BC particles in the first step and for BC particles with organic material in the second step. A total of six morphological representations of BC particles were compared, three each for spherical and fractal aggregate morphologies. In general, the aggregate representation performed well for modelling the particle light absorption coefficient Ïabs, single-scattering albedo SSA, and mass absorption cross-section MACBC for laboratory-generated BC particles with volume mean mobility diameters dp,V larger than 100nm. However, for modelling Ă
ngström absorption exponent AAE, it was difficult to suggest a method due to size dependence, although the spherical assumption was in better agreement in some cases. The BC fractal aggregates are usually modelled using monodispersed particles, since their optical simulations are computationally expensive. In such studies, the modelled optical properties showed a 25% uncertainty in using the monodisperse size method. It is shown that using the polydisperse size distribution in combination with fractal aggregate morphology reduces the uncertainty in measured Ïabs to 10% for particles with dp,V between 60-160nm. Furthermore, the sensitivities of the BC optical properties to the various model input parameters such as the real and imaginary parts of the refractive index (mre and mim), the fractal dimension (Df), and the primary particle radius (app) of an aggregate were investigated. When the BC particle is small and rather fresh, the change in the Df had relatively little effect on the optical properties. There was, however, a significant relationship between app and the particle light scattering, which increased by a factor of up to 6 with increasing total particle size. The modelled optical properties of BC are well aligned with laboratory-measured values when the following assumptions are used in the fractal aggregate representation: mre between 1.6 and 2, mim between 0.50 and 1, Df from 1.7 to 1.9, and app between 10 and 14nm. Overall, this study provides experimental support for emphasizing the importance of an appropriate size representation (polydisperse size method) and an appropriate morphological representation for optical modelling and parameterization scheme development of BC
Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?
Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study examines over 1000 conversations from the Switchboard corpus. DAs were handannotated, and prosodic features (duration, pause, F0, energy and speakingrate features) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. For an allway classification as well as three subtasks, prosody allowed highly significant classification
over chance. Featurespecific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DAspecific
statistical language model improved performance over that of the language model alone. Results suggest that DAs are redundantly marked
in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications
Automatic detection of discourse structure for speech recognition and understanding.
We describe a new approach for statistical modeling and detection of discourse structure
for natural conversational speech. Our model is based on 42 âDialog Actsâ (DAs),
(question, answer, backchannel, agreement, disagreement, apology, etc). We labeled
1155 conversations from the Switchboard (SWBD) database (Godfrey et al. 1992) of
human-to-human telephone conversations with these 42 types and trained a Dialog Act
detector based on three distinct knowledge sources: sequences of words which characterize
a dialog act, prosodic features which characterize a dialog act, and a statistical
Discourse Grammar. Our combined detector, although still in preliminary stages, already
achieves a 65% Dialog Act detection rate based on acoustic waveforms, and 72%
accuracy based on word transcripts. Using this detector to switch among the 42 Dialog-
Act-Specific trigram LMs also gave us an encouraging but not statistically significant
reduction in SWBD word error
- âŠ