4 research outputs found

    remote speech technology for speech professionals the cloudcast initiative

    Get PDF
    Clinical applications of speech technology face two challenges. The first is data sparsity. There is little data available to underpin techniques which are based on machine learning and, because it is difficult to collect disordered speech corpora, the only way to address this problem is by pooling what is produced from systems which are already in use. The second is personalisation. This field demands individual solutions, technology which adapts to its user rather than demanding that the user adapt to it. Here we introduce a project, CloudCAST, which addresses these two problems by making remote, adaptive technology available to professionals who work with speech: therapists, educators and clinicians. Index Terms: assistive technology, clinical applications of speech technolog

    An innovative speech-based user interface for smarthomes and IoT solutions to help people with speech and motor disabilities

    Get PDF
    A better use of the increasing functional capabilities of home automation systems and Internet of Things (IoT) devices to support the needs of users with disability, is the subject of a research project currently conducted by Area Ausili (Assistive Technology Area), a department of Polo Tecnologico Regionale Corte Roncati of the Local Health Trust of Bologna (Italy), in collaboration with AIAS Ausilioteca Assistive Technology (AT) Team. The main aim of the project is to develop experimental low cost systems for environmental control through simplified and accessible user interfaces. Many of the activities are focused on automatic speech recognition and are developed in the framework of the CloudCAST project. In this paper we report on the first technical achievements of the project and discuss future possible developments and applications within and outside CloudCAST

    Personalised Dialogue Management for Users with Speech Disorders

    Get PDF
    Many electronic devices are beginning to include Voice User Interfaces (VUIs) as an alternative to conventional interfaces. VUIs are especially useful for users with restricted upper limb mobility, because they cannot use keyboards and mice. These users, however, often suffer from speech disorders (e.g. dysarthria), making Automatic Speech Recognition (ASR) challenging, thus degrading the performance of the VUI. Partially Observable Markov Decision Process (POMDP) based Dialogue Management (DM) has been shown to improve the interaction performance in challenging ASR environments, but most of the research in this area has focused on Spoken Dialogue Systems (SDSs) developed to provide information, where the users interact with the system only a few times. In contrast, most VUIs are likely to be used by a single speaker over a long period of time, but very little research has been carried out on adaptation of DM models to specific speakers. This thesis explores methods to adapt DM models (in particular dialogue state tracking models and policy models) to a specific user during a longitudinal interaction. The main differences between personalised VUIs and typical SDSs are identified and studied. Then, state-of-the-art DM models are modified to be used in scenarios which are unique to long-term personalised VUIs, such as personalised models initialised with data from different speakers or scenarios where the dialogue environment (e.g. the ASR) changes over time. In addition, several speaker and environment related features are shown to be useful to improve the interaction performance. This study is done in the context of homeService, a VUI developed to help users with dysarthria to control their home devices. The study shows that personalisation of the POMDP-DM framework can greatly improve the performance of these interfaces

    Aplicaci贸n de tecnolog铆as de segmentaci贸n de audio y reconocimiento autom谩tico de dialecto para la obtenci贸n de informaci贸n de di谩logos contenidos en audio

    Get PDF
    El inter茅s de la comunidad cient铆fica en la identificaci贸n de contenidos audiovisuales ha crecido considerablemente en los 煤ltimos a帽os, debido a la necesidad de ejecutar procesos autom谩ticos de clasificaci贸n y monitoreo del cada vez mayor contenido transmitido por diferentes medios como televisi贸n, radio e internet. En este art铆culo se propone una arquitectura para la extracci贸n de informaci贸n a partir de audio, con la finalidad de aplicarlo al an谩lisis de contenidos televisivos en el contexto ecuatoriano. Para esto, se definen dos servicios, un servicio de segmentaci贸n de audio y un servicio de transcripci贸n. El servicio de segmentaci贸n identifica y extrae los segmentos de audio que contienen narrativa, m煤sica, o narrativa sobre m煤sica. Mientras que, el servicio de transcripci贸n hace un reconocimiento de los segmentos de tipo narrativa para obtener su contenido como texto. Estos servicios y las herramientas que los conforman han sido evaluados con el fin de medir su rendimiento y, en el caso de las herramientas usadas, definir cu谩l de estas es la que mejor se ajusta a la definici贸n de la arquitectura. Los resultados de las evaluaciones realizadas sobre la arquitectura propuesta demuestran que la construcci贸n de un sistema de reconocimiento de habla que haga uso de distintas herramientas de c贸digo abierto existentes ofrece un mayor nivel de precisi贸n que un servicio de transcripci贸n de disposici贸n general.The interest of the scientific community in the identification of audiovisual content has grown considerably in recent years, due to the need to execute automatic classification and monitoring processes on the increasing content broadcasted by different media such as television, radio and internet. This article proposes an architecture for extracting information from audio, with the purpose of applying it to the analysis of television contents in the Ecuadorian context. For this, two services are defined, an audio segmentation service and a transcription service. The segmentation service identifies and extracts audio segments containing speech, music, or speech with musical background. Whereas, the transcription service recognizes the speech segments to obtain its content as text. These services and the tools that conform them have been evaluated in order to measure their performance and, in the case of the tools used, to define which of these is the one that best fits the definition of the architecture. The results of the evaluations carried out on the proposed architecture demonstrate that the construction of a speech recognition system, that makes use of different existing open source tools, offers a higher level of precision than a general availability transcription service.Ingeniero de SistemasCuenc
    corecore