67,085 research outputs found

    A Voice Controlled E-Commerce Web Application

    Full text link
    Automatic voice-controlled systems have changed the way humans interact with a computer. Voice or speech recognition systems allow a user to make a hands-free request to the computer, which in turn processes the request and serves the user with appropriate responses. After years of research and developments in machine learning and artificial intelligence, today voice-controlled technologies have become more efficient and are widely applied in many domains to enable and improve human-to-human and human-to-computer interactions. The state-of-the-art e-commerce applications with the help of web technologies offer interactive and user-friendly interfaces. However, there are some instances where people, especially with visual disabilities, are not able to fully experience the serviceability of such applications. A voice-controlled system embedded in a web application can enhance user experience and can provide voice as a means to control the functionality of e-commerce websites. In this paper, we propose a taxonomy of speech recognition systems (SRS) and present a voice-controlled commodity purchase e-commerce application using IBM Watson speech-to-text to demonstrate its usability. The prototype can be extended to other application scenarios such as government service kiosks and enable analytics of the converted text data for scenarios such as medical diagnosis at the clinics.Comment: 7 page

    Review of Noise Reduction Techniques in Speech Processing

    Get PDF
    Present systems advances in speech processing systems aim at providing sturdy and reliable interfaces for sensible preparation. Achieving sturdy performance of those systems in adverse and screeching environments is one in every of the most important challenges in applications like dictation, voice-controlled devices, human-computer dialog systems and navigation systems. Performance of speech recognition systems powerfully degrades within the presence of background, just like the driving noise within a automobile. In distinction to existing works, we have a tendency to reduce the boost in noise strength that present in levels of speech recognition: feature extraction, feature improvement, speech modelling, and coaching. Thereby, we offer a summary of noise modelling ideas, speech improvement techniques, coaching ways, and model design, that square measure enforced in speech orthography recognition task considering noises created by numerous conditions. DOI: 10.17762/ijritcc2321-8169.15075

    The Arena: An indoor mixed reality space

    Get PDF
    ln this paper, we introduce the Arena, an indoor space for mobile mixed reality interaction. The Arena includes a new user tracking system appropriate for AR/MR applications and a new Too/kit oriented to the augmented and mixed reality applications developer, the MX Too/kit. This too/kit is defined at a somewhat higher abstraction levei, by hiding from the programmer low-level implementation details and facilitating ARJMR object-oriented programming. The system handles, uniformly, video input, video output (for headsets and monitors), sound aurelisation and Multimodal Human-Computer Interaction in ARJMR, including, tangible interfaces, speech recognition and gesture recognition.info:eu-repo/semantics/publishedVersio

    Voice Operated Information System in Slovak

    Get PDF
    Speech communication interfaces (SCI) are nowadays widely used in several domains. Automated spoken language human-computer interaction can replace human-human interaction if needed. Automatic speech recognition (ASR), a key technology of SCI, has been extensively studied during the past few decades. Most of present systems are based on statistical modeling, both at the acoustic and linguistic levels. Increased attention has been paid to speech recognition in adverse conditions recently, since noise-resistance has become one of the major bottlenecks for practical use of speech recognizers. Although many techniques have been developed, many challenges still have to be overcome before the ultimate goal -- creating machines capable of communicating with humans naturally -- can be achieved. In this paper we describe the research and development of the first Slovak spoken language dialogue system. The dialogue system is based on the DARPA Communicator architecture. The proposed system consists of the Galaxy hub and telephony, automatic speech recognition, text-to-speech, backend, transport and VoiceXML dialogue management modules. The SCI enables multi-user interaction in the Slovak language. Functionality of the SLDS is demonstrated and tested via two pilot applications, ``Weather forecast for Slovakia'' and ``Timetable of Slovak Railways''. The required information is retrieved from Internet resources in multi-user mode through PSTN, ISDN, GSM and/or VoIP network

    ICMI 2012 chairs' welcome

    Get PDF
    Welcome to Santa Monica and to the 14th edition of the International Conference on Multimodal Interaction, ICMI 2012. ICMI is the premier international forum for multidisciplinary research on multimodal human-human and human-computer interaction, interfaces, and system development. We had a record number of submissions this year: 147 (74 long papers, 49 short papers, 5 special session papers and 19 demo papers). From these submissions, we accepted 15 papers for long oral presentation (20.3% acceptance rate), 10 papers for short oral presentation (20.4% acceptance rate) and 19 papers presented as posters. We have a total acceptance rate of 35.8% for all short and long papers. 12 of the 19 demo papers were accepted. All 5 special session papers were directly invited by the organizers and the papers were all accepted. In addition, the program includes three invited Keynote talks. One of the two novelties introduced at ICMI this year is the Multimodal Grand Challenges. Developing systems that can robustly understand human-human communication or respond to human input requires identifying the best algorithms and their failure modes. In fields such as computer vision, speech recognition, and computational linguistics, the availability of datasets and common tasks have led to great progress. This year, we accepted four challenge workshops: the Audio-Visual Emotion Challenge (AVEC), the Haptic Voice Recognition challenge, the D-META challenge and Brain-Computer Interface challenge. Stefanie Telex and Daniel Gatica-Perez are co-chairing the grand challenge this year. All four Grand Challenges will be presented on Monday, October 22nd, and a summary session will be happening on Wednesday, October 24th, afternoon during the main conference. The second novelty at ICMI this year is the Doctoral Consortium—a separate, one-day event to take place on Monday, October 22nd, co-chaired by Bilge Mutlu and Carlos Busso. The goal of the Doctoral Consortium is to provide Ph.D. students with an opportunity to present their work to a group of mentors and peers from a diverse set of academic and industrial backgrounds and institutions, to receive feedback on their doctoral research plan and progress, and to build a cohort of young researchers interested in designing multimodal interfaces. All accepted students receive a travel grant to attend the conference. From among 25 applications, 14 students were accepted for participation and to receive travel funding. The organizers thank the National Science Foundation (award IIS-1249319) and conference sponsors for financial support

    Direct Classification of All American English Phonemes Using Signals From Functional Speech Motor Cortex

    Get PDF
    Although brain-computer interfaces (BCIs) can be used in several different ways to restore communication, communicative BCI has not approached the rate or efficiency of natural human speech. Electrocorticography (ECoG) has precise spatiotemporal resolution that enables recording of brain activity distributed over a wide area of cortex, such as during speech production. In this study, we investigated words that span the entire set of phonemes in the General American accent using ECoG with 4 subjects. We classified phonemes with up to 36% accuracy when classifying all phonemes and up to 63% accuracy for a single phoneme. Further, misclassified phonemes follow articulation organization described in phonology literature, aiding classification of whole words. Precise temporal alignment to phoneme onset was crucial for classification success. We identified specific spatiotemporal features that aid classification, which could guide future applications. Word identification was equivalent to information transfer rates as high as 3.0 bits/s (33.6 words min), supporting pursuit of speech articulation for BCI control

    The limits of speech recognition: Understanding acoustic memory and appreciating prosody (2000)

    Get PDF
    Human-human relationships are rarely a good model for the design of effective user interfaces. Spoken language is effective for human-human interaction (HHI), but it often has severe limitations when applied to human-computer interaction (HCI). Speech is slow for presenting information, it is difficult to review or edit, and it interferes with other cognitive tasks. However speech has proven to be useful for store-and-forward messages, alerts in busy environments, and input-output for blind or motor-impaired users. Speech recognition for control is helpful for hands-busy, eyes-busy, mobilityrequired, or hostile environments and it shows promise for use in telephone-based services. Dictation input is increasingly accurate, but adoption outside the disabled users community has been slow compared to visual interfaces. Obvious physical problems include fatigue from speaking continuously and the disruption in an office filled with people speaking. By understanding the cognitive processes surrounding human acoustic memory and processing, interface designers may be able to integrate speech more effectively and guide users more successfully. Then by appreciating the differences between HHI and HCI designers may be able to choose appropriate applications for human use of speech with computers. The key distinction may be the rich emotional content conveyed by prosody -- the pacing, intonation, and amplitude in spoken language. Prosody is potent for HHI, but may be disruptive for HCI
    corecore