66 research outputs found

    Object Referring in Visual Scene with Spoken Language

    Full text link
    Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. This paper investigates Object Referring with Spoken Language (ORSpoken) by presenting two datasets and one novel approach. Objects are annotated with their locations in images, text descriptions and speech descriptions. This makes the datasets ideal for multi-modality learning. The approach is developed by carefully taking down ORSpoken problem into three sub-problems and introducing task-specific vision-language interactions at the corresponding levels. Experiments show that our method outperforms competing methods consistently and significantly. The approach is also evaluated in the presence of audio noise, showing the efficacy of the proposed vision-language interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201

    Towards human-like conversational search systems

    Get PDF
    Voice search is currently widely available on the majority of mobile devices via use of Virtual Personal Assistants. However, despite its general availability, the use of voice interaction remains sporadic and is limited to basic search tasks such as checking weather updates and looking up answers to factual queries. Present-day voice search systems struggle to use relevant contextual information to maintain conversational state, and lack conversational initiative needed to clarify user’s intent, which hampers their usability and prevents users from engaging in more complex interaction activities. This research investigates the potential of a hypothesised interactive information retrieval system with human-like conversational abilities. To this end, we propose a series of usability studies that involve a working prototype of a conversational system that uses real time speech synthesis. The proposed experiments seek to provide empirical evidence that enabling a voice search system with human-like conversational abilities can lead to increased likelihood of its adoption

    Facilitating Keyboard Use While Wearing a Head-Mounted Display

    Get PDF
    Virtual reality (VR) headsets are becoming more common and will require evolving input mechanisms to support a growing range of applications. Because VR devices require users to wear head-mounted displays, there are accomodations that must be made in order to support specific input devices. One such device, a keyboard, serves as a useful tool for text entry. Many users will require assistance towards using a keyboard when wearing a head-mounted display. Developers have explored new mechanisms to overcome the challenges of text-entry for virtual reality. Several games have toyed with the idea of using motion controllers to provide a text entry mechanism, however few investigations have made on how to assist users in using a physical keyboard while wearing a head-mounted display. As an alternative to controller based text input, I propose that a software tool could facilitate the use of a physical keyboard in virtual reality. Using computer vision, a user€™s hands could be projected into the virtual world. With the ability to see the location of their hands relative to the keyboard, users will be able to type despite the obstruction caused by the head-mounted display (HMD). The viability of this approach was tested and the tool released as a plugin for the Unity development platform. The potential uses for the plugin go beyond text entry, and the project can be expanded to include many physical input devices

    Text input tools’ complementarity in immersive virtual environments

    Get PDF
    This study presents a user test in order to ascertain the advantages and disadvantages of three different text input methods in immersive virtual environment: individual Speech-to-Text, collective Speech-to-Text and a virtual keyboard named Drum-Like Keyboard. We measured participants’ user experience, especially related to usability and utility, in order to offer relevant recommendations to people seeking to integrate text input in virtual reality. Our results show that Speech-to-Text and the virtual keyboard have complementary qualities, which can be used together for optimal results and experience

    Improving the Accuracy of Mobile Touchscreen QWERTY Keyboards

    Get PDF
    In this thesis we explore alternative keyboard layouts in hopes of finding one that increases the accuracy of text input on mobile touchscreen devices. In particular, we investigate if a single swap of 2 keys can significantly improve accuracy on mobile touchscreen QWERTY keyboards. We do so by carefully considering the placement of keys, exploiting a specific vulnerability that occurs within a keyboard layout, namely, that the placement of particular keys next to others may be increasing errors when typing. We simulate the act of typing on a mobile touchscreen QWERTY keyboard, beginning with modeling the typographical errors that can occur when doing so. We then construct a simple autocorrector using Bayesian methods, describing how we can autocorrect user input and evaluate the ability of the keyboard to output the correct text. Then, using our models, we provide methods of testing and define a metric, the WAR rating, which provides us a way of comparing the accuracy of a keyboard layout. After running our tests on all 325 2-key swap layouts against the original QWERTY layout, we show that there exists more than one 2-key swap that increases the accuracy of the current QWERTY layout, and that the best 2-key swap is i ↔ t, increasing accuracy by nearly 0.18 percent

    Augmentative and alternative communication (AAC) advances: A review of configurations for individuals with a speech disability

    Get PDF
    High-tech augmentative and alternative communication (AAC) methods are on a constant rise; however, the interaction between the user and the assistive technology is still challenged for an optimal user experience centered around the desired activity. This review presents a range of signal sensing and acquisition methods utilized in conjunction with the existing high-tech AAC platforms for individuals with a speech disability, including imaging methods, touch-enabled systems, mechanical and electro-mechanical access, breath-activated methods, and brain–computer interfaces (BCI). The listed AAC sensing modalities are compared in terms of ease of access, affordability, complexity, portability, and typical conversational speeds. A revelation of the associated AAC signal processing, encoding, and retrieval highlights the roles of machine learning (ML) and deep learning (DL) in the development of intelligent AAC solutions. The demands and the affordability of most systems hinder the scale of usage of high-tech AAC. Further research is indeed needed for the development of intelligent AAC applications reducing the associated costs and enhancing the portability of the solutions for a real user’s environment. The consolidation of natural language processing with current solutions also needs to be further explored for the amelioration of the conversational speeds. The recommendations for prospective advances in coming high-tech AAC are addressed in terms of developments to support mobile health communicative applications
    • …
    corecore