1,095 research outputs found

    GEMINI: A Generic Multi-Modal Natural Interface Framework for Videogames

    Full text link
    In recent years videogame companies have recognized the role of player engagement as a major factor in user experience and enjoyment. This encouraged a greater investment in new types of game controllers such as the WiiMote, Rock Band instruments and the Kinect. However, the native software of these controllers was not originally designed to be used in other game applications. This work addresses this issue by building a middleware framework, which maps body poses or voice commands to actions in any game. This not only warrants a more natural and customized user-experience but it also defines an interoperable virtual controller. In this version of the framework, body poses and voice commands are respectively recognized through the Kinect's built-in cameras and microphones. The acquired data is then translated into the native interaction scheme in real time using a lightweight method based on spatial restrictions. The system is also prepared to use Nintendo's Wiimote as an auxiliary and unobtrusive gamepad for physically or verbally impractical commands. System validation was performed by analyzing the performance of certain tasks and examining user reports. Both confirmed this approach as a practical and alluring alternative to the game's native interaction scheme. In sum, this framework provides a game-controlling tool that is totally customizable and very flexible, thus expanding the market of game consumers.Comment: WorldCIST'13 Internacional Conferenc

    A hybrid noise suppression filter for accuracy enhancement of commercial speech recognizers in varying noisy conditions

    Get PDF
    Commercial speech recognizers have made possible many speech control applications such as wheelchair, tone-phone, multifunctional robotic arms and remote controls, for the disabled and paraplegic. However, they have a limitation in common in that recognition errors are likely to be produced when background noise surrounds the spoken command, thereby creating potential dangers for the disabled if recognition errors exist in the control systems. In this paper, a hybrid noise suppression filter is proposed to inter-face with the commercial speech recognizers in order to enhance the recognition accuracy under variant noisy conditions. It intends to decrease the recognition errors when the commercial speech recognizers are working under a noisy environment. It is based on a sigmoid function which can effectively enhance noisy speech using simple computational operations, while a robust estimator based on an adaptive-network-based fuzzy inference system is used to determine the appropriate operational parameters for the sigmoid function in order to produce effective speech enhancement under variant noisy conditions.The proposed hybrid noise suppression filter has the following advantages for commercial speech recognizers: (i) it is not possible to tune the inbuilt parameters on the commercial speech recognizers in order to obtain better accuracy; (ii) existing noise suppression filters are too complicated to be implemented for real-time speech recognition; and (iii) existing sigmoid function based filters can operate only in a single-noisy condition, but not under varying noisy conditions. The performance of the hybrid noise suppression filter was evaluated by interfacing it with a commercial speech recognizer, commonly used in electronic products. Experimental results show that improvement in terms of recognition accuracy and computational time can be achieved by the hybrid noise suppression filter when the commercial recognizer is working under various noisy environments in factories

    Multichannel filters for speech recognition using a particle swarm optimization

    Get PDF
    Speech recognition has been used in various real-world applications such as automotive control, electronic toys, electronic appliances etc. In many applications involved speech control functions, a commercial speech recognizer is used to identify the speech commands voiced out by the users and the recognized command is used to perform appropriate operations. However, users’ commands are often corrupted by surrounding ambient noise. It decreases the effectiveness of speech recognition in order to implement the commands accurately. This paper proposes a multichannel filter to enhance noisy speech commands, in order to improve accuracy of commercial speech recognizers which work under noisy environment. An innovative particle swarm optimization (PSO) is proposed to optimize the parameters of the multichannel filter which intends to improve accuracy of the commercial speech recognizer working under noisy environment. The effectiveness of the multichannel filter was evaluated by interacting with a commercial speech recognizer, which was worked in a warehouse

    Autonomous Vision Based Facial and voice Recognition on the Unmanned Aerial Vehicle

    Get PDF
    The development of human navigation and tracking in the real time environment will lead to the implementation of more advanced tasks that can performed by the autonomous robots. That means, we proposed new intelligent algorithm for human identification using difficult of facial and speech which can substantially improve the rate of recognition as compared to the biometric identification for Robust system development. This project system that can recognize face using Eigenface recognizer with Principal component analysis (PCA) and human voice using the Hidden Markov Model(HMM) and. Also in this paper, combinations of algorithms such as modified Eigenface, Haar-Cascade classifier, PCA and HMM resulted in a more robust system for facial and speech recognition. The proposed system was implemented on AR drone 2.0 using the Microsoft Visual Studio 2015 platform together with EmguCV. The testing of the proposed system carried out in an indoor environment in order to evaluate its performance in terms of detection distance, angle of detection, and accuracy of detection. 500 images of different people were used for face recognition at detection distances. The best average result of 92.22% was obtained at a detection

    Development Considerations for Implementing a Voice-Controlled Spacecraft System

    Get PDF
    As computational power and speech recognition algorithms improve, the consumer market will see better-performing speech recognition applications. The cell phone and Internet-related service industry have further enhanced speech recognition applications using artificial intelligence and statistical data-mining techniques. These improvements to speech recognition technology (SRT) may one day help astronauts on future deep space human missions that require control of complex spacecraft systems or spacesuit applications by voice. Though SRT and more advanced speech recognition techniques show promise, use of this technology for a space application such as vehicle/habitat/spacesuit requires careful considerations. This paper provides considerations and guidance for the use of SRT in voice-controlled spacecraft systems (VCSS) applications for space missions, specifically in command-and-control (C2) applications where the commanding is user-initiated. First, current SRT limitations as known at the time of this report are given. Then, highlights of SRT used in the space program provide the reader with a history of some of the human spaceflight applications and research. Next, an overview of the speech production process and the intrinsic variations of speech are provided. Finally, general guidance and considerations are given for the development of a VCSS using a human-centered design approach for space applications that includes vocabulary selection and performance testing, as well as VCSS considerations for C2 dialogue management design, feedback, error handling, and evaluation/usability testing

    MOBILE VOICE TO SIGN LANGUAGE SYSTEM

    Get PDF
    This report presents recent technologies using Mobile devices as a medium to interact between normal people and hearing disables people. This system is using Java mobile technologies to do voice processing on a mobile platform. This system will allow users to capture the voice match with appropriate sign. Voice recognition to control devices such as a robot already implemented in Java. Here some ideas to implement by using J2ME which is for small mobile devices to make it valuable for current situation or technology. There are at least 29 million people around the world who suffer from speech and hearing disabilities. It is somehow difficult for us to interact with them because of the unknown language used by them to communicate with each other. Sign language is a form of communication which is widely used by deaf and mute peoples. Thus, the only way of communication is learning their language which is sign language. As in the case in verbal language, sign language is differs from one region to another. However, when people using different signed languages meet, communication is significantly easier than when people of different language meet. Sign language, in this respect, gives access to an international deaf community to communicate. This report contains solution whereby one does not need to learn sign language to be able to communicate with the disabled. This system will convert the English language to sign language
    • …
    corecore