Search CORE

1,095 research outputs found

Speech recognition for smart homes

Author: McLoughlin Ian Vince
Sharifzadeh Hamid Reza
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Kent Academic Repository

GEMINI: A Generic Multi-Modal Natural Interface Framework for Videogames

Author: G. Saon
H. Sakoe
J. Lockman
L.A. Schwarz
M. Arantes
P.Y. Shih
T. Yamada
Publication venue
Publication date: 01/01/2013
Field of study

In recent years videogame companies have recognized the role of player engagement as a major factor in user experience and enjoyment. This encouraged a greater investment in new types of game controllers such as the WiiMote, Rock Band instruments and the Kinect. However, the native software of these controllers was not originally designed to be used in other game applications. This work addresses this issue by building a middleware framework, which maps body poses or voice commands to actions in any game. This not only warrants a more natural and customized user-experience but it also defines an interoperable virtual controller. In this version of the framework, body poses and voice commands are respectively recognized through the Kinect's built-in cameras and microphones. The acquired data is then translated into the native interaction scheme in real time using a lightweight method based on spatial restrictions. The system is also prepared to use Nintendo's Wiimote as an auxiliary and unobtrusive gamepad for physically or verbally impractical commands. System validation was performed by analyzing the performance of certain tasks and examining user reports. Both confirmed this approach as a practical and alluring alternative to the game's native interaction scheme. In sum, this framework provides a game-controlling tool that is totally customizable and very flexible, thus expanding the market of game consumers.Comment: WorldCIST'13 Internacional Conferenc

arXiv.org e-Print Archive

Crossref

A hybrid noise suppression filter for accuracy enhancement of commercial speech recognizers in varying noisy conditions

Author: Aberdeen Group
Alonso-Martin
Andrianakis
Bergh
Boll
Breithaupt
Cedric K.F. Yiu
Chan
Chen
Cohen
Damper
Eberhart
Eberhart
Giaquinto
Hak Keung Lam
He
Hu
Hu
Hu
Jang
Kit Yan Chan
Kwong
Kwong
Li
Lu
Neill
Noyes
Parsopoulos
Pei Chee Yong
Pires
Qian
Seltzer
Sensory INC Company
Sivaram
Suh
Sven Nordholm
Thramboulidis
Varga
Wesley
Widrow
Yao
Yeung
Yong
Yong
Zeltzer
Zhang
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Commercial speech recognizers have made possible many speech control applications such as wheelchair, tone-phone, multifunctional robotic arms and remote controls, for the disabled and paraplegic. However, they have a limitation in common in that recognition errors are likely to be produced when background noise surrounds the spoken command, thereby creating potential dangers for the disabled if recognition errors exist in the control systems. In this paper, a hybrid noise suppression filter is proposed to inter-face with the commercial speech recognizers in order to enhance the recognition accuracy under variant noisy conditions. It intends to decrease the recognition errors when the commercial speech recognizers are working under a noisy environment. It is based on a sigmoid function which can effectively enhance noisy speech using simple computational operations, while a robust estimator based on an adaptive-network-based fuzzy inference system is used to determine the appropriate operational parameters for the sigmoid function in order to produce effective speech enhancement under variant noisy conditions.The proposed hybrid noise suppression filter has the following advantages for commercial speech recognizers: (i) it is not possible to tune the inbuilt parameters on the commercial speech recognizers in order to obtain better accuracy; (ii) existing noise suppression filters are too complicated to be implemented for real-time speech recognition; and (iii) existing sigmoid function based filters can operate only in a single-noisy condition, but not under varying noisy conditions. The performance of the hybrid noise suppression filter was evaluated by interfacing it with a commercial speech recognizer, commonly used in electronic products. Experimental results show that improvement in terms of recognition accuracy and computational time can be achieved by the hybrid noise suppression filter when the commercial recognizer is working under various noisy environments in factories

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

PolyU Institutional Repository

King's Research Portal

espace@Curtin

Recommended from our members

uC: Ubiquitous Collaboration Platform for Multimodal Team Interaction Support

Author: Carstens Deborah
Converse Patrick D
Fiore Stephen M
Gurbuz Sabri
Kepuska Veton Z
Metcalf David
Rodriguez Walter
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2008
Field of study

A human-centered computing platform that improves teamwork and transforms the “human- computer interaction experience” for distributed teams is presented. This Ubiquitous Collaboration, or uC (“you see”), platform\u27s objective is to transform distributed teamwork (i.e., work occurring when teams of workers and learners are geographically dispersed and often interacting at different times). It achieves this goal through a multimodal team interaction interface realized through a reconfigurable open architecture. The approach taken is to integrate: (1) an intuitive speech- and video-centric multi-modal interface to augment more conventional methods (e.g., mouse, stylus and touch), (2) an open and reconfigurable architecture supporting information gathering, and (3) a machine intelligent approach to analysis and management of heterogeneous live and stored sensor data to support collaboration. The system will transform how teams of people interact with computers by drawing on both the virtual and physical environment

CSUSB ScholarWorks

Multichannel filters for speech recognition using a particle swarm optimization

Author: Chan Kit Yan
Nordholm Sven
Yiu Ka Fai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Speech recognition has been used in various real-world applications such as automotive control, electronic toys, electronic appliances etc. In many applications involved speech control functions, a commercial speech recognizer is used to identify the speech commands voiced out by the users and the recognized command is used to perform appropriate operations. However, users’ commands are often corrupted by surrounding ambient noise. It decreases the effectiveness of speech recognition in order to implement the commands accurately. This paper proposes a multichannel filter to enhance noisy speech commands, in order to improve accuracy of commercial speech recognizers which work under noisy environment. An innovative particle swarm optimization (PSO) is proposed to optimize the parameters of the multichannel filter which intends to improve accuracy of the commercial speech recognizer working under noisy environment. The effectiveness of the multichannel filter was evaluated by interacting with a commercial speech recognizer, which was worked in a warehouse

Crossref

espace@Curtin

Autonomous Vision Based Facial and voice Recognition on the Unmanned Aerial Vehicle

Author: Sanjaa Bold, Batchimeg Sosorbaram, Bat-E
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 26/02/2016
Field of study

The development of human navigation and tracking in the real time environment will lead to the implementation of more advanced tasks that can performed by the autonomous robots. That means, we proposed new intelligent algorithm for human identification using difficult of facial and speech which can substantially improve the rate of recognition as compared to the biometric identification for Robust system development. This project system that can recognize face using Eigenface recognizer with Principal component analysis (PCA) and human voice using the Hidden Markov Model(HMM) and. Also in this paper, combinations of algorithms such as modified Eigenface, Haar-Cascade classifier, PCA and HMM resulted in a more robust system for facial and speech recognition. The proposed system was implemented on AR drone 2.0 using the Microsoft Visual Studio 2015 platform together with EmguCV. The testing of the proposed system carried out in an indoor environment in order to evaluate its performance in terms of detection distance, angle of detection, and accuracy of detection. 500 images of different people were used for face recognition at detection distances. The best average result of 92.22% was obtained at a detection

International Journal on Recent and Innovation Trends in Computing and Communication

Development Considerations for Implementing a Voice-Controlled Spacecraft System

Author: Salazar George
Publication venue
Publication date
Field of study

As computational power and speech recognition algorithms improve, the consumer market will see better-performing speech recognition applications. The cell phone and Internet-related service industry have further enhanced speech recognition applications using artificial intelligence and statistical data-mining techniques. These improvements to speech recognition technology (SRT) may one day help astronauts on future deep space human missions that require control of complex spacecraft systems or spacesuit applications by voice. Though SRT and more advanced speech recognition techniques show promise, use of this technology for a space application such as vehicle/habitat/spacesuit requires careful considerations. This paper provides considerations and guidance for the use of SRT in voice-controlled spacecraft systems (VCSS) applications for space missions, specifically in command-and-control (C2) applications where the commanding is user-initiated. First, current SRT limitations as known at the time of this report are given. Then, highlights of SRT used in the space program provide the reader with a history of some of the human spaceflight applications and research. Next, an overview of the speech production process and the intrinsic variations of speech are provided. Finally, general guidance and considerations are given for the development of a VCSS using a human-centered design approach for space applications that includes vocabulary selection and performance testing, as well as VCSS considerations for C2 dialogue management design, feedback, error handling, and evaluation/usability testing

NASA Technical Reports Server

MOBILE VOICE TO SIGN LANGUAGE SYSTEM

Author: Ahamad Mohd Izuan
Publication venue: Universiti teknologi petronas
Publication date: 01/01/2008
Field of study

This report presents recent technologies using Mobile devices as a medium to interact between normal people and hearing disables people. This system is using Java mobile technologies to do voice processing on a mobile platform. This system will allow users to capture the voice match with appropriate sign. Voice recognition to control devices such as a robot already implemented in Java. Here some ideas to implement by using J2ME which is for small mobile devices to make it valuable for current situation or technology. There are at least 29 million people around the world who suffer from speech and hearing disabilities. It is somehow difficult for us to interact with them because of the unknown language used by them to communicate with each other. Sign language is a form of communication which is widely used by deaf and mute peoples. Thus, the only way of communication is learning their language which is sign language. As in the case in verbal language, sign language is differs from one region to another. However, when people using different signed languages meet, communication is significantly easier than when people of different language meet. Sign language, in this respect, gives access to an international deaf community to communicate. This report contains solution whereby one does not need to learn sign language to be able to communicate with the disabled. This system will convert the English language to sign language

UTPedia