Search CORE

2,679 research outputs found

Sonification of guidance data during road crossing for people with visual impairments or blindness

Author: Ahmetovic Dragan
Bernareggi Cristian
Gerino Andrea
Mascetti Sergio
Picinali Lorenzo
Publication venue
Publication date: 24/06/2015
Field of study

In the last years several solutions were proposed to support people with visual impairments or blindness during road crossing. These solutions focus on computer vision techniques for recognizing pedestrian crosswalks and computing their relative position from the user. Instead, this contribution addresses a different problem; the design of an auditory interface that can effectively guide the user during road crossing. Two original auditory guiding modes based on data sonification are presented and compared with a guiding mode based on speech messages. Experimental evaluation shows that there is no guiding mode that is best suited for all test subjects. The average time to align and cross is not significantly different among the three guiding modes, and test subjects distribute their preferences for the best guiding mode almost uniformly among the three solutions. From the experiments it also emerges that higher effort is necessary for decoding the sonified instructions if compared to the speech instructions, and that test subjects require frequent `hints' (in the form of speech messages). Despite this, more than 2/3 of test subjects prefer one of the two guiding modes based on sonification. There are two main reasons for this: firstly, with speech messages it is harder to hear the sound of the environment, and secondly sonified messages convey information about the "quantity" of the expected movement

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Spiral - Imperial College Digital Repository

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise

Author: Brown G
Clark N
Juergens T
Meddis R
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2012
Field of study

The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943?954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model?s ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds

University of Essex Research Repository

Crossref

Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight

Author: Doughty Hazel
Snoek Cees G. M.
Zhang Yunhua
Publication venue
Publication date: 23/06/2023
Field of study

This paper strives to recognize activities in the dark, as well as in the day. As our first contribution, we establish that state-of-the-art activity recognizers are effective during the day, but not trustworthy in the dark. The main causes are the limited availability of labeled dark videos as well as the distribution shift from the lower color contrast. To compensate for the lack of labeled dark videos, our second contribution is to introduce a pseudo-supervised learning scheme, which utilizes unlabeled and task-irrelevant dark videos to improve an activity recognizer in low light. As the lower color contrast results in visual information loss, we propose to incorporate the complementary activity information within audio, which is invariant to illumination. Since the usefulness of audio and visual features differs depending on the amount of illumination, we introduce our `darkness-adaptive' audio-visual recognizer as the third contribution. Experiments on EPIC-Kitchens, Kinetics-Sound, and Charades demonstrate our proposals are superior to image enhancement, domain adaptation and alternative audio-visual fusion methods, and can even improve robustness to occlusions.Comment: Under revie

arXiv.org e-Print Archive

Segmentation of Speech and Humming in Vocal Input

Author: Havlik J.
Polacek O.
Sporka A. J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/01/2012
Field of study

Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall

CiteSeerX

Directory of Open Access Journals

Digital library of Brno University of Technology

Extending ACL2 with SMT Solvers

Author: Greenstreet Mark
Peng Yan
Publication venue: 'Open Publishing Association'
Publication date: 20/09/2015
Field of study

We present our extension of ACL2 with Satisfiability Modulo Theories (SMT) solvers using ACL2's trusted clause processor mechanism. We are particularly interested in the verification of physical systems including Analog and Mixed-Signal (AMS) designs. ACL2 offers strong induction abilities for reasoning about sequences and SMT complements deduction methods like ACL2 with fast nonlinear arithmetic solving procedures. While SAT solvers have been integrated into ACL2 in previous work, SMT methods raise new issues because of their support for a broader range of domains including real numbers and uninterpreted functions. This paper presents Smtlink, our clause processor for integrating SMT solvers into ACL2. We describe key design and implementation issues and describe our experience with its use.Comment: In Proceedings ACL2 2015, arXiv:1509.0552

arXiv.org e-Print Archive

Directory of Open Access Journals