9,592 research outputs found

    ENHANCING EXPRESSIVITY OF DOCUMENT-CENTERED COLLABORATION WITH MULTIMODAL ANNOTATIONS

    Full text link
    As knowledge work moves online, digital documents have become a staple of human collaboration. To communicate beyond the constraints of time and space, remote and asynchronous collaborators create digital annotations over documents, substituting face-to-face meetings with online conversations. However, existing document annotation interfaces depend primarily on text commenting, which is not as expressive or nuanced as in-person communication where interlocutors can speak and gesture over physical documents. To expand the communicative capacity of digital documents, we need to enrich annotation interfaces with face-to-face-like multimodal expressions (e.g., talking and pointing over texts). This thesis makes three major contributions toward multimodal annotation interfaces for enriching collaboration around digital documents. The first contribution is a set of design requirements for multimodal annotations drawn from our user studies and explorative literature surveys. We found that the major challenges were to support lightweight access to recorded voice, to control visual occlusions of graphically rich audio interfaces, and to reduce speech anxiety in voice comment production. Second, to address these challenges, we present RichReview, a novel multimodal annotation system. RichReview is designed to capture natural communicative expressions in face-to-face document descriptions as the combination of multimodal user inputs (e.g., speech, pen-writing, and deictic pen-hovering). To balance the consumption and production of speech comments, the system employs (1) cross-modal indexing interfaces for faster audio navigation, (2) fluid document-annotation layout for reduced visual clutter, and (3) voice synthesis-based speech editing for reduced speech anxiety. The third contribution is a series of evaluations that examines the effectiveness of our design solutions. Results of our lab studies show that RichReview can successfully address the above mentioned interface problems of multimodal annotations. A subsequent series of field deployment studies test the real-world efficacy of RichReview by deploying the system for document-centered conversation activities in classrooms, such as instructor feedback for student assignments and peer discussions about course material. The results suggest that using rich annotation helps students better understand the instructor’s comments, and makes them feel more valued as a person. From the results of the peer-discussion study, we learned that retaining the richness of original speech is the key to the success of speech commenting. What follows is the discussion on the benefits, challenges, and future of multimodal annotation interfaces, and technical innovations required to realize the vision

    Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit

    Get PDF
    Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.National Foreign Language Resource Cente

    Visually Impaired Usability Requirements for Accessible Mobile Applications: A Checklist for Mobile E-book Applications

    Get PDF
    The definition of an e-book is a book in an electronic format, which can be beneficial to all readers, mainly those struggling with print books because of their vision impairments. Nevertheless, the visually impaired cannot access regular e-books because they do not meet their unique needs, and they require a more accessible e-book to reach the same expected advantages as those typically seen. Due to the lack of a clear list of these needs, developers are not aware of the specific requirements of the visually impaired for e-book applications. This paper aimed to analyse the visually impaired usability requirements for usable and accessible e-book applications. Three main activities were conducted: reviewing the literature, conducting an online survey of the visually impaired, and comparing the two results obtained earlier to acquire verified usability requirements. This study reviewed current works on the usability and accessibility of e-books from 2010 to 2022. Besides, this study also conducted reviews on common accessibility needs and standards for mobile applications. A total of 24 usability requirements were identified from the literature and compared with ten results from seven visually impaired respondents using an online survey. With these verified usability requirements, designers and practitioners can use them as a checklist to ensure all needs are considered when designing mobile e-books for the visually impaired

    Human-Computer Interaction

    Get PDF
    In this book the reader will find a collection of 31 papers presenting different facets of Human Computer Interaction, the result of research projects and experiments as well as new approaches to design user interfaces. The book is organized according to the following main topics in a sequential order: new interaction paradigms, multimodality, usability studies on several interaction mechanisms, human factors, universal design and development methodologies and tools

    Voice and Touch Diagrams (VATagrams) Diagrams for the Visually Impaired

    Get PDF
    If a picture is worth a thousand words would you rather read the two pages of text or simply view the image? Most would choose to view the image; however, for the visually impaired this isn’t always an option. Diagrams assist people in visualizing relationships between objects. Most often these diagrams act as a source for quickly referencing information about relationships. Diagrams are highly visual and as such, there are few tools to support diagram creation for visually impaired individuals. To allow the visually impaired the ability to share the same advantages in school and work as sighted colleagues, an accessible diagram tool is needed. A suitable tool for the visually impaired to create diagrams should allow these individuals to: 1. easily define the type of relationship based diagram to be created, 2. easily create the components of a relationship based diagram, 3. easily modify the components of a relationship based diagram, 4. quickly understand the structure of a relationship based diagram, 5. create a visual representation which can be used by the sighted, and 6. easily accesses reference points for tracking diagram components. To do this a series of prototypes of a tool were developed that allow visually impaired users the ability to read, create, modify and share relationship based diagrams using sound and gestural touches. This was accomplished by creating a series of applications that could be run on an iPad using an overlay that restricts the areas in which a user can perform gestures. These prototypes were tested for usability using measures of efficiency, effectiveness and satisfaction. The prototypes were tested with visually impaired, blindfolded and sighted participants. The results of the evaluation indicate that the prototypes contain the main building blocks that can be used to complete a fully functioning application to be used on an iPad

    MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS

    Get PDF
    This thesis presents a theoretical framework for the design of user-programmable robots. The objective of the work is to investigate multi-modal unconstrained natural instructions given to robots in order to design a learning robot. A corpus-centred approach is used to design an agent that can reason, learn and interact with a human in a natural unconstrained way. The corpus-centred design approach is formalised and developed in detail. It requires the developer to record a human during interaction and analyse the recordings to find instruction primitives. These are then implemented into a robot. The focus of this work has been on how to combine speech and gesture using rules extracted from the analysis of a corpus. A multi-modal integration algorithm is presented, that can use timing and semantics to group, match and unify gesture and language. The algorithm always achieves correct pairings on a corpus and initiates questions to the user in ambiguous cases or missing information. The domain of card games has been investigated, because of its variety of games which are rich in rules and contain sequences. A further focus of the work is on the translation of rule-based instructions. Most multi-modal interfaces to date have only considered sequential instructions. The combination of frame-based reasoning, a knowledge base organised as an ontology and a problem solver engine is used to store these rules. The understanding of rule instructions, which contain conditional and imaginary situations require an agent with complex reasoning capabilities. A test system of the agent implementation is also described. Tests to confirm the implementation by playing back the corpus are presented. Furthermore, deployment test results with the implemented agent and human subjects are presented and discussed. The tests showed that the rate of errors that are due to the sentences not being defined in the grammar does not decrease by an acceptable rate when new grammar is introduced. This was particularly the case for complex verbal rule instructions which have a large variety of being expressed
    corecore