4,734 research outputs found

    Comparison of In-Person and Online Recordings in the Clinical Teleassessment of Speech Production: A Pilot Study.

    Get PDF
    In certain circumstances, speech and language therapy is proposed in telepractice as a practical alternative to in-person services. However, little is known about the minimum quality requirements of recordings in the teleassessment of motor speech disorders (MSD) utilizing validated tools. The aim here is to examine the comparability of offline analyses based on speech samples acquired from three sources: (1) in-person recordings with high quality material, serving as the baseline/gold standard; (2) in-person recordings with standard equipment; (3) online recordings from videoconferencing. Speech samples were recorded simultaneously from these three sources in fifteen neurotypical speakers performing a screening battery of MSD and analyzed by three speech and language therapists. Intersource and interrater agreements were estimated with intraclass correlation coefficients on seventeen perceptual and acoustic parameters. While the interrater agreement was excellent for most speech parameters, especially on high quality in-person recordings, it decreased in online recordings. The intersource agreement was excellent for speech rate and mean fundamental frequency measures when comparing high quality in-person recordings to the other conditions. The intersource agreement was poor for voice parameters, but also for perceptual measures of intelligibility and articulation. Clinicians who plan to teleassess MSD should adapt their recording setting to the parameters they want to reliably interpret

    Applying Machine Learning Techniques to forecast the level of dementia from spontaneous speech conversations

    Get PDF
    This report summarize the duties and result conducted in the work placement. The project in the work placement was about applying machine learning to mobile applications, which analyzed human language to forecast speakers’ level of dementia. It can be split into four parts, including collecting audio data, exploratory data analysis, machine learning analysis, and applying the machine learning model to the existing product. Firstly, collected dementia group and normal group audio data samples as the input of further analysis usage. In the second part, explored the data to find insights and filter noise before building machine learning models. After that building machine learning models analyzed the characteristics of the speech and forecast speakers' level of dementia. At last, combining the machine learning results in the mobile application development .This report summarize the duties and result conducted in the work placement. The project in the work placement was about applying machine learning to mobile applications, which analyzed human language to forecast speakers’ level of dementia. It can be split into four parts, including collecting audio data, exploratory data analysis, machine learning analysis, and applying the machine learning model to the existing product. Firstly, collected dementia group and normal group audio data samples as the input of further analysis usage. In the second part, explored the data to find insights and filter noise before building machine learning models. After that building machine learning models analyzed the characteristics of the speech and forecast speakers' level of dementia. At last, combining the machine learning results in the mobile application development

    Computational Language Assessment in patients with speech, language, and communication impairments

    Full text link
    Speech, language, and communication symptoms enable the early detection, diagnosis, treatment planning, and monitoring of neurocognitive disease progression. Nevertheless, traditional manual neurologic assessment, the speech and language evaluation standard, is time-consuming and resource-intensive for clinicians. We argue that Computational Language Assessment (C.L.A.) is an improvement over conventional manual neurological assessment. Using machine learning, natural language processing, and signal processing, C.L.A. provides a neuro-cognitive evaluation of speech, language, and communication in elderly and high-risk individuals for dementia. ii. facilitates the diagnosis, prognosis, and therapy efficacy in at-risk and language-impaired populations; and iii. allows easier extensibility to assess patients from a wide range of languages. Also, C.L.A. employs Artificial Intelligence models to inform theory on the relationship between language symptoms and their neural bases. It significantly advances our ability to optimize the prevention and treatment of elderly individuals with communication disorders, allowing them to age gracefully with social engagement.Comment: 36 pages, 2 figures, to be submite

    Automatic Framework to Aid Therapists to Diagnose Children who Stutter

    Get PDF

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    A usage-based approach to language processing and intervention in aphasia

    Get PDF
    Non-fluent aphasia (NFA) is characterized by grammatically impoverished language output. Yet there is evidence that a restricted set of multi-word utterances (e.g., “don’t know”) are retained. Analyses of connected speech often dismiss these as stereotypical, however, these high-frequency phrases are an interactional resource in both neurotypical and aphasic discourse. One approach that can account for these forms is usage-based grammar, where linguistic knowledge is thought of as an inventory of constructions, i.e., form-meaning pairings such as familiar collocations (“wait a minute”) and semi-fixed phrases (“I want X”). This approach is used in language development and second language learning research, but its application to aphasiology is currently limited. This thesis applied a usage-based perspective to language processing and intervention in aphasia. Study 1 investigated use of word combinations in conversations of nine participants with Broca’s aphasia (PWA) and their conversation partners (CPs), combining analysis of form (frequency-based approach) and function (interactional linguistics approach). In study 2, an on-line word monitoring task was used to examine whether individuals with aphasia and neurotypical controls showed sensitivity to collocation strength (degree of association between units of a word combination). Finally, the impact of a novel intervention involving loosening of slots in semi-fixed phrases was piloted with five participants with NFA. Study 1 revealed that PWA used stronger collocated word combinations compared to CPs, and familiar collocations are a resource adapted to the constraints of aphasia. Findings from study 2 indicated that words were recognised more rapidly when preceded by strongly collocated words in both neurotypical and aphasic listeners, although effects were stronger for controls. Study 3 resulted in improved connected speech for some participants. Future research is needed to refine outcome measures for connected speech interventions. This thesis suggests that usage-based grammar has potential to explain grammatical behaviour in aphasia, and to inform interventions

    Peer interaction and learning opportunities in cohesive and less cohesive L2 classrooms

    Get PDF
    The present study investigates peer to peer oral interaction in two task based language teaching classrooms, one of which was a self-declared cohesive group, and the other a self- declared less cohesive group, both at B1 level. It studies how learners talk cohesion into being and considers how this talk leads to learning opportunities in these groups. The study was classroom-based and was carried out over the period of an academic year. Research was conducted in the classrooms and the tasks were part of regular class work. The research was framed within a sociocognitive perspective of second language learning and data came from a number of sources, namely questionnaires, interviews and audio recorded talk of dyads, triads and groups of four students completing a total of eight oral tasks. These audio recordings were transcribed and analysed qualitatively for interactions which encouraged a positive social dimension and behaviours which led to learning opportunities, using conversation analysis. In addition, recordings were analysed quantitatively for learning opportunities and quantity and quality of language produced. Results show that learners in both classes exhibited multiple behaviours in interaction which could promote a positive social dimension, although behaviours which could discourage positive affect amongst group members were also found. Analysis of interactions also revealed the many ways in which learners in both the cohesive and less cohesive class created learning opportunities. Further qualitative analysis of these interactions showed that a number of factors including how learners approach a task, the decisions they make at zones of interactional transition and the affective relationship between participants influence the amount of learning opportunities created, as well as the quality and quantity of language produced. The main conclusion of the study is that it is not the cohesive nature of the group as a whole but the nature of the relationship between the individual members of the small group completing the task which influences the effectiveness of oral interaction for learning.This study contributes to our understanding of the way in which learners individualise the learning space and highlights the situated nature of language learning. It shows how individuals interact with each other and the task, and how talk in interaction changes moment-by-moment as learners react to the ‘here and now’ of the classroom environment.O presente estudo é uma investigação no âmbito da interacção oral em pares em duas salas de aula: um grupo auto declarado coeso, outro declarado menos coeso, ambos de nível B1. O estudo revela a forma como os alunos criam coesão e oportunidades de aprendizagem através do discurso. O estudo foi baseado em exercícios práticos desempenhados em sala de aula, tendo sido desenvolvido ao longo de um ano lectivo académico. Isto é, a investigação é o resultado da observação e análise do trabalho prático regular realizado em aula pelos discentes. A pesquisa foi enquadrada numa perspectiva sociocognitiva de aprendizagem da segunda língua, e a informação provém de um conjunto de fontes metodologicamente utilizadas, nomeadamente questionários, entrevistas e registos áudio das conversas das díades, tríades e grupos de quatro alunos, num total de oito tarefas de oralidade. Os registos áudio foram transcritos e qualitativamente analisados para interacções que estimulavam uma dimensão social positiva, e comportamentos que conduziam a oportunidades de aprendizagem usando Conversation Analysis. Além disso, os registos foram também analisados quantitativamente relativamente às oportunidades de aprendizagem e à qualidade e quantidade de linguagem produzida. Em ambas as turmas, os resultados indicam múltiplos comportamentos interactivos por parte dos estudantes, comportamentos esses que promovem uma dimensão social positiva, embora tenham sido detectados também, comportamentos que podem desencorajar a afectividade entre os elementos do grupo. A análise do processo de interacção revelou também as diversas formas através das quais os estudantes criaram oportunidades de aprendizagem em ambos os grupos; o coeso e o menos coeso. A outro nível, uma análise qualitativa complementar destas interacções mostrou que, tanto o número de oportunidades de aprendizagem criadas, como a qualidade e quantidade de linguagem produzida são influenciadas por vários factores, nomeadamente o modo como os estudantes desempenham a tarefa, as decisões que tomam em zonas de transição interactiva e as relações afectivas entre os participantes. A principal conclusão do estudo é que não é a condição coesa do grupo como um todo, mas a natureza da relação entre os seus membros que completam a tarefa, que influencia a eficácia da interacção oral na aprendizagem.Este estudo contribui para a nossa compreensão do modo como os alunos singularizam o espaço de aprendizagem, ao mesmo tempo que destaca a natureza contextual do ensino da língua. Mostra ainda como interagem os indivíduos uns com os outros e com a tarefa, e como, no processo de interacção, o discurso muda a cada momento, devido à reacção dos alunos ao “aqui e agora” do ambiente da aula

    Deep Neural Networks for Automatic Speech-To-Speech Translation of Open Educational Resources

    Full text link
    [ES] En los últimos años, el aprendizaje profundo ha cambiado significativamente el panorama en diversas áreas del campo de la inteligencia artificial, entre las que se incluyen la visión por computador, el procesamiento del lenguaje natural, robótica o teoría de juegos. En particular, el sorprendente éxito del aprendizaje profundo en múltiples aplicaciones del campo del procesamiento del lenguaje natural tales como el reconocimiento automático del habla (ASR), la traducción automática (MT) o la síntesis de voz (TTS), ha supuesto una mejora drástica en la precisión de estos sistemas, extendiendo así su implantación a un mayor rango de aplicaciones en la vida real. En este momento, es evidente que las tecnologías de reconocimiento automático del habla y traducción automática pueden ser empleadas para producir, de forma efectiva, subtítulos multilingües de alta calidad de contenidos audiovisuales. Esto es particularmente cierto en el contexto de los vídeos educativos, donde las condiciones acústicas son normalmente favorables para los sistemas de ASR y el discurso está gramaticalmente bien formado. Sin embargo, en el caso de TTS, aunque los sistemas basados en redes neuronales han demostrado ser capaces de sintetizar voz de un realismo y calidad sin precedentes, todavía debe comprobarse si esta tecnología está lo suficientemente madura como para mejorar la accesibilidad y la participación en el aprendizaje en línea. Además, existen diversas tareas en el campo de la síntesis de voz que todavía suponen un reto, como la clonación de voz inter-lingüe, la síntesis incremental o la adaptación zero-shot a nuevos locutores. Esta tesis aborda la mejora de las prestaciones de los sistemas actuales de síntesis de voz basados en redes neuronales, así como la extensión de su aplicación en diversos escenarios, en el contexto de mejorar la accesibilidad en el aprendizaje en línea. En este sentido, este trabajo presta especial atención a la adaptación a nuevos locutores y a la clonación de voz inter-lingüe, ya que los textos a sintetizar se corresponden, en este caso, a traducciones de intervenciones originalmente en otro idioma.[CA] Durant aquests darrers anys, l'aprenentatge profund ha canviat significativament el panorama en diverses àrees del camp de la intel·ligència artificial, entre les quals s'inclouen la visió per computador, el processament del llenguatge natural, robòtica o la teoria de jocs. En particular, el sorprenent èxit de l'aprenentatge profund en múltiples aplicacions del camp del processament del llenguatge natural, com ara el reconeixement automàtic de la parla (ASR), la traducció automàtica (MT) o la síntesi de veu (TTS), ha suposat una millora dràstica en la precisió i qualitat d'aquests sistemes, estenent així la seva implantació a un ventall més ampli a la vida real. En aquest moment, és evident que les tecnologies de reconeixement automàtic de la parla i traducció automàtica poden ser emprades per a produir, de forma efectiva, subtítols multilingües d'alta qualitat de continguts audiovisuals. Això és particularment cert en el context dels vídeos educatius, on les condicions acústiques són normalment favorables per als sistemes d'ASR i el discurs està gramaticalment ben format. No obstant això, al cas de TTS, encara que els sistemes basats en xarxes neuronals han demostrat ser capaços de sintetitzar veu d'un realisme i qualitat sense precedents, encara s'ha de comprovar si aquesta tecnologia és ja prou madura com per millorar l'accessibilitat i la participació en l'aprenentatge en línia. A més, hi ha diverses tasques al camp de la síntesi de veu que encara suposen un repte, com ara la clonació de veu inter-lingüe, la síntesi incremental o l'adaptació zero-shot a nous locutors. Aquesta tesi aborda la millora de les prestacions dels sistemes actuals de síntesi de veu basats en xarxes neuronals, així com l'extensió de la seva aplicació en diversos escenaris, en el context de millorar l'accessibilitat en l'aprenentatge en línia. En aquest sentit, aquest treball presta especial atenció a l'adaptació a nous locutors i a la clonació de veu interlingüe, ja que els textos a sintetitzar es corresponen, en aquest cas, a traduccions d'intervencions originalment en un altre idioma.[EN] In recent years, deep learning has fundamentally changed the landscapes of a number of areas in artificial intelligence, including computer vision, natural language processing, robotics, and game theory. In particular, the striking success of deep learning in a large variety of natural language processing (NLP) applications, including automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS), has resulted in major accuracy improvements, thus widening the applicability of these technologies in real-life settings. At this point, it is clear that ASR and MT technologies can be utilized to produce cost-effective, high-quality multilingual subtitles of video contents of different kinds. This is particularly true in the case of transcription and translation of video lectures and other kinds of educational materials, in which the audio recording conditions are usually favorable for the ASR task, and there is a grammatically well-formed speech. However, although state-of-the-art neural approaches to TTS have shown to drastically improve the naturalness and quality of synthetic speech over conventional concatenative and parametric systems, it is still unclear whether this technology is already mature enough to improve accessibility and engagement in online learning, and particularly in the context of higher education. Furthermore, advanced topics in TTS such as cross-lingual voice cloning, incremental TTS or zero-shot speaker adaptation remain an open challenge in the field. This thesis is about enhancing the performance and widening the applicability of modern neural TTS technologies in real-life settings, both in offline and streaming conditions, in the context of improving accessibility and engagement in online learning. Thus, particular emphasis is placed on speaker adaptation and cross-lingual voice cloning, as the input text corresponds to a translated utterance in this context.Pérez González De Martos, AM. (2022). Deep Neural Networks for Automatic Speech-To-Speech Translation of Open Educational Resources [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/184019TESISPremios Extraordinarios de tesis doctorale
    corecore