17 research outputs found

    Interfaces de fala silenciosa multimodais para português europeu com base na articulação

    Get PDF
    Doutoramento conjunto MAPi em InformáticaThe concept of silent speech, when applied to Human-Computer Interaction (HCI), describes a system which allows for speech communication in the absence of an acoustic signal. By analyzing data gathered during different parts of the human speech production process, Silent Speech Interfaces (SSI) allow users with speech impairments to communicate with a system. SSI can also be used in the presence of environmental noise, and in situations in which privacy, confidentiality, or non-disturbance are important. Nonetheless, despite recent advances, performance and usability of Silent Speech systems still have much room for improvement. A better performance of such systems would enable their application in relevant areas, such as Ambient Assisted Living. Therefore, it is necessary to extend our understanding of the capabilities and limitations of silent speech modalities and to enhance their joint exploration. Thus, in this thesis, we have established several goals: (1) SSI language expansion to support European Portuguese; (2) overcome identified limitations of current SSI techniques to detect EP nasality (3) develop a Multimodal HCI approach for SSI based on non-invasive modalities; and (4) explore more direct measures in the Multimodal SSI for EP acquired from more invasive/obtrusive modalities, to be used as ground truth in articulation processes, enhancing our comprehension of other modalities. In order to achieve these goals and to support our research in this area, we have created a multimodal SSI framework that fosters leveraging modalities and combining information, supporting research in multimodal SSI. The proposed framework goes beyond the data acquisition process itself, including methods for online and offline synchronization, multimodal data processing, feature extraction, feature selection, analysis, classification and prototyping. Examples of applicability are provided for each stage of the framework. These include articulatory studies for HCI, the development of a multimodal SSI based on less invasive modalities and the use of ground truth information coming from more invasive/obtrusive modalities to overcome the limitations of other modalities. In the work here presented, we also apply existing methods in the area of SSI to EP for the first time, noting that nasal sounds may cause an inferior performance in some modalities. In this context, we propose a non-invasive solution for the detection of nasality based on a single Surface Electromyography sensor, conceivable of being included in a multimodal SSI.O conceito de fala silenciosa, quando aplicado a interação humano-computador, permite a comunicação na ausência de um sinal acústico. Através da análise de dados, recolhidos no processo de produção de fala humana, uma interface de fala silenciosa (referida como SSI, do inglês Silent Speech Interface) permite a utilizadores com deficiências ao nível da fala comunicar com um sistema. As SSI podem também ser usadas na presença de ruído ambiente, e em situações em que privacidade, confidencialidade, ou não perturbar, é importante. Contudo, apesar da evolução verificada recentemente, o desempenho e usabilidade de sistemas de fala silenciosa tem ainda uma grande margem de progressão. O aumento de desempenho destes sistemas possibilitaria assim a sua aplicação a áreas como Ambientes Assistidos. É desta forma fundamental alargar o nosso conhecimento sobre as capacidades e limitações das modalidades utilizadas para fala silenciosa e fomentar a sua exploração conjunta. Assim, foram estabelecidos vários objetivos para esta tese: (1) Expansão das linguagens suportadas por SSI com o Português Europeu; (2) Superar as limitações de técnicas de SSI atuais na deteção de nasalidade; (3) Desenvolver uma abordagem SSI multimodal para interação humano-computador, com base em modalidades não invasivas; (4) Explorar o uso de medidas diretas e complementares, adquiridas através de modalidades mais invasivas/intrusivas em configurações multimodais, que fornecem informação exata da articulação e permitem aumentar a nosso entendimento de outras modalidades. Para atingir os objetivos supramencionados e suportar a investigação nesta área procedeu-se à criação de uma plataforma SSI multimodal que potencia os meios para a exploração conjunta de modalidades. A plataforma proposta vai muito para além da simples aquisição de dados, incluindo também métodos para sincronização de modalidades, processamento de dados multimodais, extração e seleção de características, análise, classificação e prototipagem. Exemplos de aplicação para cada fase da plataforma incluem: estudos articulatórios para interação humano-computador, desenvolvimento de uma SSI multimodal com base em modalidades não invasivas, e o uso de informação exata com origem em modalidades invasivas/intrusivas para superar limitações de outras modalidades. No trabalho apresentado aplica-se ainda, pela primeira vez, métodos retirados do estado da arte ao Português Europeu, verificando-se que sons nasais podem causar um desempenho inferior de um sistema de fala silenciosa. Neste contexto, é proposta uma solução para a deteção de vogais nasais baseada num único sensor de eletromiografia, passível de ser integrada numa interface de fala silenciosa multimodal

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    INNOVATING CONTROL AND EMOTIONAL EXPRESSIVE MODALITIES OF USER INTERFACES FOR PEOPLE WITH LOCKED-IN SYNDROME

    Get PDF
    Patients with Lock-In-Syndrome (LIS) lost their ability to control any body part beside their eyes. Current solutions mainly use eye-tracking cameras to track patients' gaze as system input. However, despite the fact that interface design greatly impacts user experience, only a few guidelines have been were proposed so far to insure an easy, quick, fluid and non-tiresome computer system for these patients. On the other hand, the emergence of dedicated computer software has been greatly increasing the patients' capabilities, but there is still a great need for improvements as existing systems still present low usability and limited capabilities. Most interfaces designed for LIS patients aim at providing internet browsing or communication abilities. State of the art augmentative and alternative communication systems mainly focus on sentences communication without considering the need for emotional expression inextricable from human communication. This thesis aims at exploring new system control and expressive modalities for people with LIS. Firstly, existing gaze-based web-browsing interfaces were investigated. Page analysis and high mental workload appeared as recurring issues with common systems. To address this issue, a novel user interface was designed and evaluated against a commercial system. The results suggested that it is easier to learn and to use, quicker, more satisfying, less frustrating, less tiring and less prone to error. Mental workload was greatly diminished with this system. Other types of system control for LIS patients were then investigated. It was found that galvanic skin response may be used as system input and that stress related bio-feedback helped lowering mental workload during stressful tasks. Improving communication was one of the main goal of this research and in particular emotional communication. A system including a gaze-controlled emotional voice synthesis and a personal emotional avatar was developed with this purpose. Assessment of the proposed system highlighted the enhanced capability to have dialogs more similar to normal ones, to express and to identify emotions. Enabling emotion communication in parallel to sentences was found to help with the conversation. Automatic emotion detection seemed to be the next step toward improving emotional communication. Several studies established that physiological signals relate to emotions. The ability to use physiological signals sensors with LIS patients and their non-invasiveness made them an ideal candidate for this study. One of the main difficulties of emotion detection is the collection of high intensity affect-related data. Studies in this field are currently mostly limited to laboratory investigations, using laboratory-induced emotions, and are rarely adapted for real-life applications. A virtual reality emotion elicitation technique based on appraisal theories was proposed here in order to study physiological signals of high intensity emotions in a real-life-like environment. While this solution successfully elicited positive and negative emotions, it did not elicit the desired emotions for all subject and was therefore, not appropriate for the goals of this research. Collecting emotions in the wild appeared as the best methodology toward emotion detection for real-life applications. The state of the art in the field was therefore reviewed and assessed using a specifically designed method for evaluating datasets collected for emotion recognition in real-life applications. The proposed evaluation method provides guidelines for future researcher in the field. Based on the research findings, a mobile application was developed for physiological and emotional data collection in the wild. Based on appraisal theory, this application provides guidance to users to provide valuable emotion labelling and help them differentiate moods from emotions. A sample dataset collected using this application was compared to one collected using a paper-based preliminary study. The dataset collected using the mobile application was found to provide a more valuable dataset with data consistent with literature. This mobile application was used to create an open-source affect-related physiological signals database. While the path toward emotion detection usable in real-life application is still long, we hope that the tools provided to the research community will represent a step toward achieving this goal in the future. Automatically detecting emotion could not only be used for LIS patients to communicate but also for total-LIS patients who have lost their ability to move their eyes. Indeed, giving the ability to family and caregiver to visualize and therefore understand the patients' emotional state could greatly improve their quality of life. This research provided tools to LIS patients and the scientific community to improve augmentative and alternative communication, technologies with better interfaces, emotion expression capabilities and real-life emotion detection. Emotion recognition methods for real-life applications could not only enhance health care but also robotics, domotics and many other fields of study. A complete system fully gaze-controlled was made available open-source with all the developed solutions for LIS patients. This is expected to enhance their daily lives by improving their communication and by facilitating the development of novel assistive systems capabilities

    Selection strategies in gaze interaction

    Get PDF
    This thesis deals with selection strategies in gaze interaction, specifically for a context where gaze is the sole input modality for users with severe motor impairments. The goal has been to contribute to the subfield of assistive technology where gaze interaction is necessary for the user to achieve autonomous communication and environmental control. From a theoretical point of view research has been done on the physiology of the gaze, eye tracking technology, and a taxonomy of existing selection strategies has been developed. Empirically two overall approaches have been taken. Firstly, end-user research has been conducted through interviews and observation. The capabilities, requirements, and wants of the end-user have been explored. Secondly, several applications have been developed to explore the selection strategy of single stroke gaze gestures (SSGG) and aspects of complex gaze gestures. The main finding is that single stroke gaze gestures can successfully be used as a selection strategy. Some of the features of SSGG are: That horizontal single stroke gaze gestures are faster than vertical single stroke gaze gestures; That there is a significant difference in completion time depending on gesture length; That single stroke gaze gestures can be completed without visual feedback; That gaze tracking equipment has a significant effect on the completion times and error rates of single stroke gaze gestures; That there is not a significantly greater chance of making selection errors with single stroke gaze gestures compared with dwell selection. The overall conclusion is that the future of gaze interaction should focus on developing multi-modal interactions for mono-modal input

    Advancing Pattern Recognition Techniques for Brain-Computer Interfaces: Optimizing Discriminability, Compactness, and Robustness

    Get PDF
    In dieser Dissertation formulieren wir drei zentrale Zielkriterien zur systematischen Weiterentwicklung der Mustererkennung moderner Brain-Computer Interfaces (BCIs). Darauf aufbauend wird ein Rahmenwerk zur Mustererkennung von BCIs entwickelt, das die drei Zielkriterien durch einen neuen Optimierungsalgorithmus vereint. Darüber hinaus zeigen wir die erfolgreiche Umsetzung unseres Ansatzes für zwei innovative BCI Paradigmen, für die es bisher keine etablierte Mustererkennungsmethodik gibt

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

    Acoustical measurements on stages of nine U.S. concert halls

    Get PDF
    corecore