Search CORE

14 research outputs found

Generating speech user interfaces from interaction acts

Author: Stina Nylander
Thomas Nyström
Publication venue
Publication date: 30/04/2020
Field of study

ABSTRACT We have applied interaction acts, an abstract user-service interaction specification, to speech user interfaces to investigate how well it lends itself to a new type of user interface. We used interaction acts to generate a VoiceXML-based speech user interface, and identified two main issues connected to the differences between graphical user interfaces and speech user interfaces. The first issue concerns the structure of the user interface. Generating speech user interfaces and GUIs from the same underlying structure easily results in a too hierarchical and difficult to use speech user interface. The second issue is user input. Interpreting spoken user input is fundamentally different from user input in GUIs. We have shown that it is possible to generate speech user interfaces based on. A small user study supports the results. We discuss these issues and some possible solutions, and some results from preliminary user studies

CiteSeerX

Commanding and re-dictation: Developing eyes-free voice-based interaction for editing dictated text

Author: GHOSH Debjyoti
HARA Kotaro
LIU Can
ZHAO Shengdong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2020
Field of study

Institutional Knowledge at Singapore Management University

Hear Me Out: A Study on the Use of the Voice Modality for Crowdsourced Relevance Assessments

Author: Balayn Agathe
Hauff Claudia
Maxwell David
Roy Nirmal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/04/2023
Field of study

The creation of relevance assessments by human assessors (often nowadays crowdworkers) is a vital step when building IR test collections. Prior works have investigated assessor quality & behaviour, though into the impact of a document's presentation modality on assessor efficiency and effectiveness. Given the rise of voice-based interfaces, we investigate whether it is feasible for assessors to judge the relevance of text documents via a voice-based interface. We ran a user study (n = 49) on a crowdsourcing platform where participants judged the relevance of short and long documents sampled from the TREC Deep Learning corpus-presented to them either in the text or voice modality. We found that: (i) participants are equally accurate in their judgements across both the text and voice modality; (ii) with increased document length it takes participants significantly longer (for documents of length > 120 words it takes almost twice as much time) to make relevance judgements in the voice condition; and (iii) the ability of assessors to ignore stimuli that are not relevant (i.e., inhibition) impacts the assessment quality in the voice modality-assessors with higher inhibition are significantly more accurate than those with lower inhibition. Our results indicate that we can reliably leverage the voice modality as a means to effectively collect relevance labels from crowdworkers.Comment: Accepted at SIGIR 202

arXiv.org e-Print Archive

Mobile Search Interfaces for isiXhosa Speakers: A Comparison Between Voice and Text

Author: Modise Morebodi
Suleman Hussein
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Search interfaces have for a long time been targeted at the resource-rich languages such as English. However, due to the increase in use of mobile phones in developing countries, these interfaces can now be adapted to languages in these settings to support information access on the Web. In this study, we propose two mobile search interfaces - text and voice - to support isiXhosa speakers to search for information on the Web. Experiments were conducted with 34 native isiXhosa speakers to measure satisfaction with the two interfaces. The results show that isiXhosa speakers were more satisfied with the mobile text interface

UCT Computer Science Research Document Archive

Spoken conversational search: speech-only interactive information retrieval

Author: Trippas J
Publication venue: ACM (New York, USA)
Publication date
Field of study

This research investigates a new interface paradigm for interactive information retrieval (IIR) which forces us to shift away from the classic "ten blue links" search engine results page. Instead we investigate how to present search results through a conversation over a speech-only communication channel where no screen is available. Accessing information via speech is becoming increasingly pervasive and is already important for people with a visual impairment. However, presenting search results over a speech-only communication channel is challenging due to cognitive limitations and the transient nature of audio. Studies have indicated that the implementation of speech recognizers and screen readers must be carefully designed and cannot simply be added to an existing system. Therefore the aim of this research is to develop a new interaction framework for effective and efficient IIR over a speech-only channel: a Spoken Conversational Search System (SCSS) which provides a conversational approach to defining user information needs, presenting results and enabling search reformulations. In order to contribute to a more efficient and effective search experience when using a SCSS, we intend for a tighter integration between document search and conversational processes

RMIT Research Repository

Automatic translation of formal data specifications to voice data-input applications.

Author: Hanna Fadi
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2006
Field of study

This thesis introduces a complete solution for automatic translation of formal data specifications to voice data-input applications. The objective of the research is to automatically generate applications for inputting data through speech from specifications of the structure of the data. The formal data specifications are XML DTDs. A new formalization called Grammar-DTD (G-DTD) is introduced as an extended DTD that contains grammars to describe valid values of the DTD elements and attributes. G-DTDs facilitate the automatic generation of Voice XML applications that correspond to the original DTD structure. The development of the automatic application-generator included identifying constraints on the G-DTD to ensure a feasible translation, using predicate calculus to build a knowledge base of inference rules that describes the mapping procedure, and writing an algorithm for the automatic translation based on the inference rules.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2006 .H355. Source: Masters Abstracts International, Volume: 45-01, page: 0354. Thesis (M.Sc.)--University of Windsor (Canada), 2006

Scholarship at UWindsor

The future was there:travel report of the CHI 2000 conference, 2-6 April 2000, The Hague, The Netherlands

Author: Brinkman W.P.
Fischer A.R.H.
Houben M.M.J.
Malchanau A.V.
Mensvoort van, K.M.
Riele te, S.M.M.
Wesselink J.W.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2000
Field of study

Pure OAI Repository

A comparison of mobile search interfaces for isiXhosa speakers

Author: Modise Morebodi
Publication venue: Department of Computer Science
Publication date: 25/02/2019
Field of study

Search interfaces have for a long time been targeted at the resource-rich languages, such as English. There has been little effort to support African (Bantu) languages in search interfaces when compared to languages such as English, particularly the isiXhosa language. However, due to the increase in use of mobile phones in developing countries, these interfaces can now be adapted to languages in these settings to support information access on the Web. This study proposes mobile search interfaces to support isiXhosa speakers to search for information on the Web using isiXhosa as a discovery language. The isiXhosa language is considered a low-resourced African (Bantu) language spoken in resource-constrained environments in South Africa. The language is spoken by over eight million people. Yet, there has been no search interface specifically targeted at supporting isiXhosa speakers. Two mobile search interfaces were developed on an Android application. The interfaces were text based and voice based. The design of the interfaces was based on feedback from 4 native isiXhosa speakers in a design focus group, and guidelines from the literature. Using the developed interfaces, an experiment was conducted with 34 native isiXhosa speaking students at the University of Cape Town, South Africa. This was done to investigate, which interface could better support isiXhosa speakers to search for information on the Web using mobile phones. Quantitative data was collected using application log files. Additionally, user feedback was then obtained using the standard Software Usability Measurement Inventory (SUMI) instrument, and both interfaces were confirmed as usable. In contrast to what was expected, users preferred the text interface in general, and according to most SUMI subscales. This could be because of greater familiarity with text search interfaces or because of the relative scarcity of voice interfaces in African (Bantu) languages. Where users are not literate, the voice interface may be the only option, so the fact that it was deemed usable is an important independent finding. Search in African (Bantu) language collections is still a largely unexplored field, and more work needs to be done on the interfaces as the algorithms and collections are developed in parallel

Cape Town University OpenUCT

WebVoice: Speech Access to Traditional Web Content for Blind Users

Author: Chandon Shahriar
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2009
Field of study

Traditional web content and navigation features are made available to blind users by converting a webpage into a speech enabled X+V application, which allows blind users to follow the links present in a web page via speech commands. Also the application can read the different paragraphs and search for a word. This X+V application runs on the Opera browser

Scholarship at UWindsor

Designing multimodal interaction for the visually impaired

Author: Chen Xiaoyu
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2007
Field of study

Although multimodal computer input is believed to have advantages over unimodal input, little has been done to understand how to design a multimodal input mechanism to facilitate visually impaired users\u27 information access. This research investigates sighted and visually impaired users\u27 multimodal interaction choices when given an interaction grammar that supports speech and touch input modalities. It investigates whether task type, working memory load, or prevalence of errors in a given modality impact a user\u27s choice. Theories in human memory and attention are used to explain the users\u27 speech and touch input coordination. Among the abundant findings from this research, the following are the most important in guiding system design: (1) Multimodal input is likely to be used when it is available. (2) Users select input modalities based on the type of task undertaken. Users prefer touch input for navigation operations, but speech input for non-navigation operations. (3) When errors occur, users prefer to stay in the failing modality, instead of switching to another modality for error correction. (4) Despite the common multimodal usage patterns, there is still a high degree of individual differences in modality choices. Additional findings include: (I) Modality switching becomes more prevalent when lower working memory and attentional resources are required for the performance of other concurrent tasks. (2) Higher error rates increases modality switching but only under duress. (3) Training order affects modality usage. Teaching a modality first versus second increases the use of this modality in users\u27 task performance. In addition to discovering multimodal interaction patterns above, this research contributes to the field of human computer interaction design by: (1) presenting a design of an eyes-free multimodal information browser, (2) presenting a Wizard of Oz method for working with visually impaired users in order to observe their multimodal interaction. The overall contribution of this work is that of one of the early investigations into how speech and touch might be combined into a non-visual multimodal system that can effectively be used for eyes-free tasks

Digital Commons @ New Jersey Institute of Technology (NJIT)