2,679 research outputs found
Sonification of guidance data during road crossing for people with visual impairments or blindness
In the last years several solutions were proposed to support people with
visual impairments or blindness during road crossing. These solutions focus on
computer vision techniques for recognizing pedestrian crosswalks and computing
their relative position from the user. Instead, this contribution addresses a
different problem; the design of an auditory interface that can effectively
guide the user during road crossing. Two original auditory guiding modes based
on data sonification are presented and compared with a guiding mode based on
speech messages.
Experimental evaluation shows that there is no guiding mode that is best
suited for all test subjects. The average time to align and cross is not
significantly different among the three guiding modes, and test subjects
distribute their preferences for the best guiding mode almost uniformly among
the three solutions. From the experiments it also emerges that higher effort is
necessary for decoding the sonified instructions if compared to the speech
instructions, and that test subjects require frequent `hints' (in the form of
speech messages). Despite this, more than 2/3 of test subjects prefer one of
the two guiding modes based on sonification. There are two main reasons for
this: firstly, with speech messages it is harder to hear the sound of the
environment, and secondly sonified messages convey information about the
"quantity" of the expected movement
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise
The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943?954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model?s ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
This paper strives to recognize activities in the dark, as well as in the
day. As our first contribution, we establish that state-of-the-art activity
recognizers are effective during the day, but not trustworthy in the dark. The
main causes are the limited availability of labeled dark videos as well as the
distribution shift from the lower color contrast. To compensate for the lack of
labeled dark videos, our second contribution is to introduce a
pseudo-supervised learning scheme, which utilizes unlabeled and task-irrelevant
dark videos to improve an activity recognizer in low light. As the lower color
contrast results in visual information loss, we propose to incorporate the
complementary activity information within audio, which is invariant to
illumination. Since the usefulness of audio and visual features differs
depending on the amount of illumination, we introduce our `darkness-adaptive'
audio-visual recognizer as the third contribution. Experiments on
EPIC-Kitchens, Kinetics-Sound, and Charades demonstrate our proposals are
superior to image enhancement, domain adaptation and alternative audio-visual
fusion methods, and can even improve robustness to occlusions.Comment: Under revie
Segmentation of Speech and Humming in Vocal Input
Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall
Extending ACL2 with SMT Solvers
We present our extension of ACL2 with Satisfiability Modulo Theories (SMT)
solvers using ACL2's trusted clause processor mechanism. We are particularly
interested in the verification of physical systems including Analog and
Mixed-Signal (AMS) designs. ACL2 offers strong induction abilities for
reasoning about sequences and SMT complements deduction methods like ACL2 with
fast nonlinear arithmetic solving procedures. While SAT solvers have been
integrated into ACL2 in previous work, SMT methods raise new issues because of
their support for a broader range of domains including real numbers and
uninterpreted functions. This paper presents Smtlink, our clause processor for
integrating SMT solvers into ACL2. We describe key design and implementation
issues and describe our experience with its use.Comment: In Proceedings ACL2 2015, arXiv:1509.0552
- …