6 research outputs found

    Audio Transcription and Summarization System using Cloud Computing and Artificial Intelligence

    Get PDF
    In the modern era, organizations increasingly rely on virtual meetings to address customer issues promptly and effectively. However, dealing with recorded customer calls can be arduous. This review abstract introduces an innovative methodology to summarize audio data from customer interactions, which can streamline virtual meetings. Leveraging a speech recognizer, like AssemblyAI's API, the methodology converts audio data into text, and then employs a Graph-theoretic approach to generate concise summaries. This review abstract delves into the growing prominence of cloud-based AI and ML services in the tech industry. It underscores the unique competitive strategies and focuses of major players, namely Amazon, Microsoft, and Google, in the realm of AI and ML platform development. The analysis explores these companies' internal applications and external ecosystem, dissecting their respective AI and ML development strategies. Finally, it predicts future directions for AI and ML platforms, including potential business models and emerging trends, while considering how Amazon, Microsoft, and Google align their platform development strategies with these future prospects

    Speeching: Mobile Crowdsourced Speech Assessment to Support Self-Monitoring and Management for People with Parkinson's

    Get PDF
    We present Speeching, a mobile application that uses crowdsourcing to support the self-monitoring and management of speech and voice issues for people with Parkinson's (PwP). The application allows participants to audio record short voice tasks, which are then rated and assessed by crowd workers. Speeching then feeds these results back to provide users with examples of how they were perceived by listeners unconnected to them (thus not used to their speech patterns). We conducted our study in two phases. First we assessed the feasibility of utilising the crowd to provide ratings of speech and voice that are comparable to those of experts. We then conducted a trial to evaluate how the provision of feedback, using Speeching, was valued by PwP. Our study highlights how applications like Speeching open up new opportunities for self-monitoring in digital health and wellbeing, and provide a means for those without regular access to clinical assessment services to practice-and get meaningful feedback on-their speech

    Crowdsourcing for Hispanic Linguistics: Amazon’s Mechanical Turk as a source of Spanish data

    Get PDF
    Within the field of Linguistics, Amazon’s Mechanical Turk, a crowdsourcing marketplace specializes in computer-based Human Intelligence Tasks, has been praised as a cost efficient source of data for English and other major languages. Spanish is a good candidate due to its presence within the US and beyond. Still, detailed information concerning the linguistic and demographic profile of Spanish-speaking ‘Turkers’ is missing, thus making it difficult for researchers to evaluate whether the Mechanical Turk provides the right environment for their tasks. This paper addresses this gap in our knowledge by developing the first detailed study of the presence of Spanish-speaking workers, focusing on factors relevant for research planning, namely, (socio)linguistically relevant variables and information concerning work habits. The results show that this platform provides access to a fairly active participant pool of both L1 and L2Spanish speakers as well as bilinguals. A brief introduction to how Amazon’s Mechanical Turk works and overview of Hispanic Linguistics projects that have so far used the Mechanical Turk successfully is included

    Incorporating Weak Statistics for Low-Resource Language Modeling

    Get PDF
    Automatic speech recognition (ASR) requires a strong language model to guide the acoustic model and favor likely utterances. While many tasks enjoy billions of language model training tokens, many domains which require ASR do not have readily available electronic corpora.The only source of useful language modeling data is expensive and time-consuming human transcription of in-domain audio. This dissertation seeks to quickly and inexpensively improve low-resource language modeling for use in automatic speech recognition. This dissertation first considers efficient use of non-professional human labor to best improve system performance, and demonstrate that it is better to collect more data, despite higher transcription error, than to redundantly transcribe data to improve quality. In the process of developing procedures to collect such data, this work also presents an efficient rating scheme to detect poor transcribers without gold standard data. As an alternative to this process, automatic transcripts are generated with an ASR system and explore efficiently combining these low-quality transcripts with a small amount of high quality transcripts. Standard n-gram language models are sensitive to the quality of the highest order n-gram and are unable to exploit accurate weaker statistics. Instead, a log-linear language model is introduced, which elegantly incorporates a variety of background models through MAP adaptation. This work introduces marginal class constraints which effectively capture knowledge of transcriber error and improve performance over n-gram features. Finally, this work constrains the language modeling task to keyword search of words unseen in the training text. While overall system performance is good, these words suffer the most due to a low probability in the language model. Semi-supervised learning effectively extracts likely n-grams containing these new keywords from a large corpus of audio. By using a search metric that favors recall over precision, this method captures over 80% of the potential gain

    The design and evaluation of novel technologies for the self monitoring and management of Parkinson's symptoms

    Get PDF
    PhD ThesisThis thesis explores how digital technologies might better support people with Parkinson’s (PwP) to take control of their condition, by engaging in self monitoring and management practices. The specific focus of this thesis is around issues managed by Speech and Language Therapists (SLTs) (namely drooling and speech and voice changes). Three case studies were used to explore the ways that different technologies might be configured to aid the self monitoring and management of these speech and drooling symptoms. The first case study describes an evaluation of PDCue, a wrist worn device to assist the self management of drooling through the use of a temporal cueing method, to increase swallowing frequency. This study showed evidence that drooling can be behaviourally self managed through cueing—like other symptoms of Parkinson’s such as gait freezing—and proved a viable first step towards re-considering the use of additional medications as a first option for drooling treatment. However, whilst this study proved successful in understanding the ways in which a simple, temporal cueing technique might support drooling management, it opened up questions around the ways in which PwP might use technology to actively think about and understand their condition through self monitoring, and use this information to support self management practices further. In response, the second case study describes the design and evaluation of LApp, an application to support both the self monitoring and management of vocal loudness issues through the use of an insitu cueing approach. The Google Glass was chosen as the platform to run the cueing method on, due to its technical capabilities as a multi-sensor, wearable platform, to analyse a constant stream of audio and provide real time visual prompts to support the wearer in increasing their volume at times when it is needed in conversation. This study highlighted how participants saw value in LApp in supporting their loudness issues, but also noted a desire for participants to understand more about their speech and the SLT strategies that they were required to do in order to improve it. The third case study drew upon this desire for increased understanding by developing and evaluating Speeching, which employed crowdsourcing through a smartphone application to support the self monitoring of speech and voice changes, through the provision of human feedback, and the subsequent effect that this feedback had on self management practices. This study yielded positive responses from participants, who valued the anonymous feedback from the crowd and the support that this provided them in configuring their home based speech practice. A final discussion chapter draws the 3 case studies together and discusses the lessons learned throughout the research. It discusses the overall research questions for the thesis in detail and describes the implications of the research for the wider HCI and medical communities. A framework is presented which aims to visualise the levels of agency that the studied technologies afforded and the levels of responsiveness required by participants to make sense of, and implement the information being provided by the devices in order to facilitate a change to the self monitoring and management practices. Through the design and evaluation of the described technologies and a synthesis of the findings across the span of the research, this thesis explores the ways in which PwP, with a diverse range of symptoms and related physical, social and emotional issues, might value digital technologies and their potential to facilitate new forms of self monitoring and self management in their everyday lives.The National Institute of Health Research (NIHR): The Engineering and Physical Sciences Research Council (EPSRC): Gordon Chapman Memorial Fund
    corecore