6 research outputs found
Audio Transcription and Summarization System using Cloud Computing and Artificial Intelligence
In the modern era, organizations increasingly rely on virtual meetings to address customer issues promptly and effectively. However, dealing with recorded customer calls can be arduous. This review abstract introduces an innovative methodology to summarize audio data from customer interactions, which can streamline virtual meetings. Leveraging a speech recognizer, like AssemblyAI's API, the methodology converts audio data into text, and then employs a Graph-theoretic approach to generate concise summaries.
This review abstract delves into the growing prominence of cloud-based AI and ML services in the tech industry. It underscores the unique competitive strategies and focuses of major players, namely Amazon, Microsoft, and Google, in the realm of AI and ML platform development. The analysis explores these companies' internal applications and external ecosystem, dissecting their respective AI and ML development strategies. Finally, it predicts future directions for AI and ML platforms, including potential business models and emerging trends, while considering how Amazon, Microsoft, and Google align their platform development strategies with these future prospects
Speeching: Mobile Crowdsourced Speech Assessment to Support Self-Monitoring and Management for People with Parkinson's
We present Speeching, a mobile application that uses crowdsourcing to support the self-monitoring and management of speech and voice issues for people with Parkinson's (PwP). The application allows participants to audio record short voice tasks, which are then rated and assessed by crowd workers. Speeching then feeds these results back to provide users with examples of how they were perceived by listeners unconnected to them (thus not used to their speech patterns). We conducted our study in two phases. First we assessed the feasibility of utilising the crowd to provide ratings of speech and voice that are comparable to those of experts. We then conducted a trial to evaluate how the provision of feedback, using Speeching, was valued by PwP. Our study highlights how applications like Speeching open up new opportunities for self-monitoring in digital health and wellbeing, and provide a means for those without regular access to clinical assessment services to practice-and get meaningful feedback on-their speech
Crowdsourcing for Hispanic Linguistics: Amazonâs Mechanical Turk as a source of Spanish data
Within the field of Linguistics, Amazonâs Mechanical Turk, a crowdsourcing marketplace specializes in computer-based Human Intelligence Tasks, has been praised as a cost efficient source of data for English and other major languages. Spanish is a good candidate due to its presence within the US and beyond. Still, detailed information concerning the linguistic and demographic profile of Spanish-speaking âTurkersâ is missing, thus making it difficult for researchers to evaluate whether the Mechanical Turk provides the right environment for their tasks. This paper addresses this gap in our knowledge by developing the first detailed study of the presence of Spanish-speaking workers, focusing on factors relevant for research planning, namely, (socio)linguistically relevant variables and information concerning work habits. The results show that this platform provides access to a fairly active participant pool of both L1 and L2Spanish speakers as well as bilinguals. A brief introduction to how Amazonâs Mechanical Turk works and overview of Hispanic Linguistics projects that have so far used the Mechanical Turk successfully is included
Incorporating Weak Statistics for Low-Resource Language Modeling
Automatic speech recognition (ASR) requires a strong language model to guide the acoustic model and favor likely utterances. While many tasks enjoy billions of language model training tokens, many domains which require ASR do not have readily available electronic corpora.The only source of useful language modeling data is expensive and time-consuming human transcription of in-domain audio. This dissertation seeks to quickly and inexpensively improve low-resource language modeling for use in automatic speech recognition.
This dissertation first considers efficient use of non-professional human labor to best improve system performance, and demonstrate that it is better to collect more data, despite higher transcription error, than to redundantly transcribe data to improve quality. In the process of developing procedures to collect such data, this work also presents an efficient rating scheme to detect poor transcribers without gold standard data.
As an alternative to this process, automatic transcripts are generated with an ASR system and explore efficiently combining these low-quality transcripts with a small amount of high quality transcripts. Standard n-gram language models are sensitive to the quality of the highest order n-gram and are unable to exploit accurate weaker statistics. Instead, a log-linear language model is introduced, which elegantly incorporates a variety of background models through MAP adaptation. This work introduces marginal class constraints which effectively capture knowledge of transcriber error and improve performance over n-gram features.
Finally, this work constrains the language modeling task to keyword search of words unseen in the training text. While overall system performance is good, these words suffer the most due to a low probability in the language model. Semi-supervised learning effectively extracts likely n-grams containing these new keywords from a large corpus of audio. By using a search metric that favors recall over precision, this method captures over 80% of the potential gain
The design and evaluation of novel technologies for the self monitoring and management of Parkinson's symptoms
PhD ThesisThis thesis explores how digital technologies might better support people with Parkinsonâs
(PwP) to take control of their condition, by engaging in self monitoring and management
practices. The specific focus of this thesis is around issues managed by Speech and Language
Therapists (SLTs) (namely drooling and speech and voice changes). Three case studies were
used to explore the ways that different technologies might be configured to aid the self
monitoring and management of these speech and drooling symptoms.
The first case study describes an evaluation of PDCue, a wrist worn device to assist
the self management of drooling through the use of a temporal cueing method, to increase
swallowing frequency. This study showed evidence that drooling can be behaviourally self
managed through cueingâlike other symptoms of Parkinsonâs such as gait freezingâand
proved a viable first step towards re-considering the use of additional medications as a first
option for drooling treatment. However, whilst this study proved successful in
understanding the ways in which a simple, temporal cueing technique might support
drooling management, it opened up questions around the ways in which PwP might use
technology to actively think about and understand their condition through self monitoring,
and use this information to support self management practices further. In response, the
second case study describes the design and evaluation of LApp, an application to support
both the self monitoring and management of vocal loudness issues through the use of an insitu
cueing approach. The Google Glass was chosen as the platform to run the cueing
method on, due to its technical capabilities as a multi-sensor, wearable platform, to analyse
a constant stream of audio and provide real time visual prompts to support the wearer in
increasing their volume at times when it is needed in conversation. This study highlighted
how participants saw value in LApp in supporting their loudness issues, but also noted a
desire for participants to understand more about their speech and the SLT strategies that
they were required to do in order to improve it. The third case study drew upon this desire
for increased understanding by developing and evaluating Speeching, which employed
crowdsourcing through a smartphone application to support the self monitoring of speech
and voice changes, through the provision of human feedback, and the subsequent effect
that this feedback had on self management practices. This study yielded positive responses
from participants, who valued the anonymous feedback from the crowd and the support
that this provided them in configuring their home based speech practice.
A final discussion chapter draws the 3 case studies together and discusses the
lessons learned throughout the research. It discusses the overall research questions for the
thesis in detail and describes the implications of the research for the wider HCI and medical
communities. A framework is presented which aims to visualise the levels of agency that the
studied technologies afforded and the levels of responsiveness required by participants to
make sense of, and implement the information being provided by the devices in order to
facilitate a change to the self monitoring and management practices. Through the design
and evaluation of the described technologies and a synthesis of the findings across the span
of the research, this thesis explores the ways in which PwP, with a diverse range of
symptoms and related physical, social and emotional issues, might value digital technologies
and their potential to facilitate new forms of self monitoring and self management in their
everyday lives.The National Institute of Health Research (NIHR):
The Engineering and Physical Sciences Research Council (EPSRC):
Gordon Chapman Memorial Fund