Search CORE

12,556 research outputs found

A Study of User's Performance and Satisfaction on the Web Based Photo Annotation with Speech Interaction

Author: Ismail Nor Azman
Ramlan Siti Azura
Publication venue
Publication date: 01/06/2010
Field of study

This paper reports on empirical evaluation study of users' performance and satisfaction with prototype of Web Based speech photo annotation with speech interaction. Participants involved consist of Johor Bahru citizens from various background. They have completed two parts of annotation task; part A involving PhotoASys; photo annotation system with proposed speech interaction and part B involving Microsoft Microsoft Vista Speech Interaction style. They have completed eight tasks for each part including system login and selection of album and photos. Users' performance was recorded using computer screen recording software. Data were captured on the task completion time and subjective satisfaction. Participants need to complete a questionnaire on the subjective satisfaction when the task was completed. The performance data show the comparison between proposed speech interaction and Microsoft Vista Speech interaction applied in photo annotation system, PhotoASys. On average, the reduction in annotation performance time due to using proposed speech interaction style was 64.72% rather than using speech interaction Microsoft Vista style. Data analysis were showed in different statistical significant in annotation performance and subjective satisfaction for both styles of interaction. These results could be used for the next design in related software which involves personal belonging management.Comment: IEEE Publication Format, https://sites.google.com/site/journalofcomputing

arXiv.org e-Print Archive

Universiti Teknologi Malaysia Institutional Repository

Human Factors of Integrating Speech and Manual Input Devices: The Case of Computer Aided Design

Author: Khalid HM
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/1990
Field of study

The thesis investigates integrating the use of speech input and manual input devices in human-computer systems. The domain of computer aided design (CAD) is used as a case study. A methodology for empirical evaluation of CAD systems is presented. The methodology is based on a framework that describes the input/output processes presumed to underlie performance in design activities, using behaviour protocols and performance indices as data. For modelling system behaviour, a framework derived from the Blackboard architecture of design is described. The framework employs knowledge sources to represent different behaviour types recruited during CAD performance. Variability in user behaviour throughout the investigation is explained with reference to the model. The problems that expert CAD users experience in using manual input devices are first documented in an observational study conducted at their workplace. This demonstrates that the unitary use of manual input resulted in non-optimal behaviour. Possible solutions to these problems, using speech input for some command and data entry tasks, are explored in three experiments. In each experiment, a comparative analysis of alternative systems is made using data obtained from naive and novice users. In Experiment 1, the use of speech as a unitary solution to the problems of manual input was also found to result in non-optimal behaviour and performance. The solution explored in Experiment 2 was to allocate some commands and alphanumeric data to each input device, using the frequency of use principle. This approach, however, entailed the additional problem of remembering which device to use. Experiment 3 evaluated the separate allocation of commands to speech input and numeric plus graphical data to manual input. Additionally, performance aids and feedback facilities were provided to users. This clear-cut assignment of device to task characteristics and the use of such aids led to an enhancement in speech performance, in addition to improving behaviour. The findings from this research are used to develop guidelines for an integrated CAD system involving speech and manual input. The guidelines, which are intended for use by end users, CAD implementors and system designers, were validated in the workplace by the latter. Lastly, the thesis contextualises the research within an ergonomics framework, mapping the research development from problem specification to application and synthesis. Problems with the investigation are also discussed, and suggestions made as to how these might be resolved

UCL Discovery

SpeechMirror: A Multimodal Visual Analytics System for Personalized Reflection of Online Public Speaking Effectiveness

Author: Deng Xiaoming
He Qiang
Huang Zeyuan
Lai Yu-Kun
Liu Yong-Jin
Ma Cuixia
Maher Kevin
Qin Sheng-feng
Wang Hongan
Publication venue
Publication date: 10/09/2023
Field of study

As communications are increasingly taking place virtually, the ability to present well online is becoming an indispensable skill. Online speakers are facing unique challenges in engaging with remote audiences. However, there has been a lack of evidence-based analytical systems for people to comprehensively evaluate online speeches and further discover possibilities for improvement. This paper introduces SpeechMirror, a visual analytics system facilitating reflection on a speech based on insights from a collection of online speeches. The system estimates the impact of different speech techniques on effectiveness and applies them to a speech to give users awareness of the performance of speech techniques. A similarity recommendation approach based on speech factors or script content supports guided exploration to expand knowledge of presentation evidence and accelerate the discovery of speech delivery possibilities. SpeechMirror provides intuitive visualizations and interactions for users to understand speech factors. Among them, SpeechTwin, a novel multimodal visual summary of speech, supports rapid understanding of critical speech factors and comparison of different speech samples, and SpeechPlayer augments the speech video by integrating visualization of the speaker's body language with interaction, for focused analysis. The system utilizes visualizations suited to the distinct nature of different speech factors for user comprehension. The proposed system and visualization techniques were evaluated with domain experts and amateurs, demonstrating usability for users with low visualization literacy and its efficacy in assisting users to develop insights for potential improvement.Comment: Main paper (11 pages, 6 figures) and Supplemental document (11 pages, 11 figures). Accepted by VIS 202

arXiv.org e-Print Archive

Application of Machine Learning within Visual Content Production

Author: Giunchi Daniele
Publication venue: UCL (University College London)
Publication date: 28/07/2021
Field of study

We are living in an era where digital content is being produced at a dazzling pace. The heterogeneity of contents and contexts is so varied that a numerous amount of applications have been created to respond to people and market demands. The visual content production pipeline is the generalisation of the process that allows a content editor to create and evaluate their product, such as a video, an image, a 3D model, etc. Such data is then displayed on one or more devices such as TVs, PC monitors, virtual reality head-mounted displays, tablets, mobiles, or even smartwatches. Content creation can be simple as clicking a button to film a video and then share it into a social network, or complex as managing a dense user interface full of parameters by using keyboard and mouse to generate a realistic 3D model for a VR game. In this second example, such sophistication results in a steep learning curve for beginner-level users. In contrast, expert users regularly need to refine their skills via expensive lessons, time-consuming tutorials, or experience. Thus, user interaction plays an essential role in the diffusion of content creation software, primarily when it is targeted to untrained people. In particular, with the fast spread of virtual reality devices into the consumer market, new opportunities for designing reliable and intuitive interfaces have been created. Such new interactions need to take a step beyond the point and click interaction typical of the 2D desktop environment. The interactions need to be smart, intuitive and reliable, to interpret 3D gestures and therefore, more accurate algorithms are needed to recognise patterns. In recent years, machine learning and in particular deep learning have achieved outstanding results in many branches of computer science, such as computer graphics and human-computer interface, outperforming algorithms that were considered state of the art, however, there are only fleeting efforts to translate this into virtual reality. In this thesis, we seek to apply and take advantage of deep learning models to two different content production pipeline areas embracing the following subjects of interest: advanced methods for user interaction and visual quality assessment. First, we focus on 3D sketching to retrieve models from an extensive database of complex geometries and textures, while the user is immersed in a virtual environment. We explore both 2D and 3D strokes as tools for model retrieval in VR. Therefore, we implement a novel system for improving accuracy in searching for a 3D model. We contribute an efficient method to describe models through 3D sketch via an iterative descriptor generation, focusing both on accuracy and user experience. To evaluate it, we design a user study to compare different interactions for sketch generation. Second, we explore the combination of sketch input and vocal description to correct and fine-tune the search for 3D models in a database containing fine-grained variation. We analyse sketch and speech queries, identifying a way to incorporate both of them into our system's interaction loop. Third, in the context of the visual content production pipeline, we present a detailed study of visual metrics. We propose a novel method for detecting rendering-based artefacts in images. It exploits analogous deep learning algorithms used when extracting features from sketches

UCL Discovery

Human-Computer Interaction

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

In this book the reader will find a collection of 31 papers presenting different facets of Human Computer Interaction, the result of research projects and experiments as well as new approaches to design user interfaces. The book is organized according to the following main topics in a sequential order: new interaction paradigms, multimodality, usability studies on several interaction mechanisms, human factors, universal design and development methodologies and tools

Directory of Open Access Books (DOAB)

Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality

Author: Giunchi D
James S
Steed A
Sztrajman A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2021
Field of study

Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities

UCL Discovery

Brain-Computer Interface and Silent Speech Recognition on Decentralized Messaging Applications

Author: Lourenço Fábio André Vilar
Publication venue
Publication date: 01/01/2020
Field of study

Online communications have been increasingly gaining prevalence in people’s daily lives, with its widespread adoption being catalyzed by technological advances, especially in instant messaging platforms. Although there have been strides for the inclusion of disabled individuals to ease communication between peers, people who suffer hand/arm impairments have little to no support in regular mainstream applications to efficiently communicate with other individuals. Moreover, a problem with the current solutions that fall back on speech-to-text techniques is the lack of privacy when the usage of these alternatives is conducted in public. Additionally, as centralized systems have come into scrutiny regarding privacy and security, the development of alternative decentralized solutions has increased by the use of blockchain technology and its variants. Within the inclusivity paradigm, this project showcases an alternative on human-computer interaction with support for the aforementioned disabled people, through the use of a braincomputer interface allied to a silent speech recognition system, for application navigation and text input purposes, respectively. A brain-computer interface allows a user to interact with the platform just by though, while the silent speech recognition system enables the input of text by reading activity from articulatory muscles without the need of actually speaking audibly. Therefore, the combination of both techniques creates a full hands-free interaction with the platform, empowering hand/arm disabled users in daily life communications. Furthermore, the users of the application will be inserted in a decentralized system that is designed for secure communication and exchange of data between peers, enforcing the privacy concern that is a cornerstone of the platform.Comunicações online têm cada vez mais ganhado prevalência na vida contemporânea de pessoas, tendo a sua adoção sido catalisada pelos avanços tecnológicos, especialmente em plataformas de mensagens instantâneas. Embora tenham havido desenvolvimentos relativamente à inclusão de indivíduos com deficiência para facilitar a comunicação entre pessoas, as que sofrem de incapacidades motoras nas mãos/braços têm um suporte escasso em aplicações convencionais para comunicar de forma eficiente com outros sujeitos. Além disso, um problema com as soluções atuais que recorrem a técnicas de voz-para-texto é a falta de privacidade nas comunicações quando usadas em público. Adicionalmente, há medida que sistemas centralizados têm atraído ceticismo relativamente à privacidade e segurança, o desenvolvimento de soluções descentralizadas e alternativas têm aumentado pelo uso de tecnologias de blockchain e as suas variantes. Dentro do paradigma de inclusão, este projeto demonstras uma alternativa na interação humano-computador com suporte para os indivíduos referidos anteriormente, através do uso de uma interface cérebro-computador aliada a um sistema de reconhecimento de fala silenciosa, para navegação na aplicação e introdução de texto, respetivamente. Uma interface cérebro-computador permite o utilizador interagir com a plataforma apenas através do pensamento, enquanto que um sistema de reconhecimento de fala silenciosa possibilita a introdução de texto pela leitura da atividade dos músculos articulatórios, sem a necessidade de falar em voz alta. Assim, a combinação de ambas as técnicas criam uma interação totalmente de mãos-livres com a plataforma, melhorando as comunicações do dia-a-dia de pessoas com incapacidades nas mãos/braços. Além disso, os utilizadores serão inseridos num sistema descentralizado, desenhado para comunicações e trocas de dados seguras entre pares, reforçando, assim, a preocupação com a privacidade, que é um conceito base da plataforma

Repositório Científico do Instituto Politécnico do Porto

To Affinity and Beyond: Interactive Digital Humans as a Human Computer Interface

Author: Seymour Michael
Publication venue: The University of Sydney Business School, Discipline of Business Information Systems
Publication date: 01/01/2019
Field of study

The field of human computer interaction is increasingly exploring the use of more natural, human-like user interfaces to build intelligent agents to aid in everyday life. This is coupled with a move to people using ever more realistic avatars to represent themselves in their digital lives. As the ability to produce emotionally engaging digital human representations is only just now becoming technically possible, there is little research into how to approach such tasks. This is due to both technical complexity and operational implementation cost. This is now changing as we are at a nexus point with new approaches, faster graphics processing and enabling new technologies in machine learning and computer vision becoming available. I articulate the issues required for such digital humans to be considered successfully located on the other side of the phenomenon known as the Uncanny Valley. My results show that a complex mix of perceived and contextual aspects affect the sense making on digital humans and highlights previously undocumented effects of interactivity on the affinity. Users are willing to accept digital humans as a new form of user interface and they react to them emotionally in previously unanticipated ways. My research shows that it is possible to build an effective interactive digital human that crosses the Uncanny Valley. I directly explore what is required to build a visually realistic digital human as a primary research question and I explore if such a realistic face provides sufficient benefit to justify the challenges involved in building it. I conducted a Delphi study to inform the research approaches and then produced a complex digital human character based on these insights. This interactive and realistic digital human avatar represents a major technical undertaking involving multiple teams around the world. Finally, I explored a framework for examining the ethical implications and signpost future research areas

Sydney eScholarship