Search CORE

4,580 research outputs found

MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones

Author: Prabhakar T. V.
Rao Pooja
Sasindran Zitha
Yelchuri Harsha
Publication venue
Publication date: 09/11/2023
Field of study

We describe a comprehensive methodology for developing user-voice personalized automatic speech recognition (ASR) models by effectively training models on mobile phones, allowing user data and models to be stored and used locally. To achieve this, we propose a resource-aware sub-model-based training approach that considers the RAM, and battery capabilities of mobile phones. By considering the evaluation metric and resource constraints of the mobile phones, we are able to perform efficient training and halt the process accordingly. To simulate real users, we use speakers with various accents. The entire on-device training and evaluation framework was then tested on various mobile phones across brands. We show that fine-tuning the models and selecting the right hyperparameter values is a trade-off between the lowest achievable performance metric, on-device training time, and memory consumption. Overall, our methodology offers a comprehensive solution for developing personalized ASR models while leveraging the capabilities of mobile phones, and balancing the need for accuracy with resource constraints.Comment: Accepted in AIMLSystems 202

arXiv.org e-Print Archive

Designing Human-Centered Collective Intelligence

Author: Addo Ivor
Publication venue: e-Publications@Marquette
Publication date: 01/07/2016
Field of study

Human-Centered Collective Intelligence (HCCI) is an emergent research area that seeks to bring together major research areas like machine learning, statistical modeling, information retrieval, market research, and software engineering to address challenges pertaining to deriving intelligent insights and solutions through the collaboration of several intelligent sensors, devices and data sources. An archetypal contextual CI scenario might be concerned with deriving affect-driven intelligence through multimodal emotion detection sources in a bid to determine the likability of one movie trailer over another. On the other hand, the key tenets to designing robust and evolutionary software and infrastructure architecture models to address cross-cutting quality concerns is of keen interest in the “Cloud” age of today. Some of the key quality concerns of interest in CI scenarios span the gamut of security and privacy, scalability, performance, fault-tolerance, and reliability. I present recent advances in CI system design with a focus on highlighting optimal solutions for the aforementioned cross-cutting concerns. I also describe a number of design challenges and a framework that I have determined to be critical to designing CI systems. With inspiration from machine learning, computational advertising, ubiquitous computing, and sociable robotics, this literature incorporates theories and concepts from various viewpoints to empower the collective intelligence engine, ZOEI, to discover affective state and emotional intent across multiple mediums. The discerned affective state is used in recommender systems among others to support content personalization. I dive into the design of optimal architectures that allow humans and intelligent systems to work collectively to solve complex problems. I present an evaluation of various studies that leverage the ZOEI framework to design collective intelligence

epublications@Marquette

CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

Author: Bardeli Rolf
Boujemaa Nozha
Compañó Ramón
Doch Christoph
Geurts Joost
Gouraud Henri
Joly Alexis
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Schreer Oliver
Sebe Nicu
Snoek Cees
Publication venue: Chorus Project Consortium
Publication date: 01/01/2008
Field of study

After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities

Author: Van Gysel Christophe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/04/2023
Field of study

Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition. We discuss how query domain classification, knowledge graphs and user interaction data, and query personalization can be helpful to improve the accurate recognition of spoken information domain queries. Finally, we also provide a brief overview of current problems and challenges in speech recognition.Comment: SIGIR '23. The 46th International ACM SIGIR Conference on Research & Development in Information Retrieva

arXiv.org e-Print Archive

Targeted Subset Selection for Limited-data ASR Accent Adaptation

Author: Iyer Rishabh
Jyothi Preethi
Kothyari Mayank
Mekala Anmol Reddy
Ramakrishnan Ganesh
Publication venue
Publication date: 07/04/2022
Field of study

We study the task of adapting an existing ASR model to a non-native accent while being constrained by a transcription budget on the duration of utterances selected from a large unlabeled corpus. We propose a subset selection approach using the recently proposed submodular mutual information functions, in which we identify a diverse set of utterances that match the target accent. This is specified through a few target utterances and achieved by modelling the relationship between the target and the selected subsets using these functions. The model adapts to the accent through fine-tuning with utterances selected and transcribed from the unlabeled corpus. We also use an accent classifier to learn accent-aware feature representations. Our method is also able to exploit samples from other accents to perform out-of-domain selections for low-resource accents which are not available in these corpora. We show that the targeted subset selection approach improves significantly upon random sampling - by around 5% to 10% (absolute) in most cases, and is around 10x more label-efficient. We also compare with an oracle method where we specifically pick from the target accent and our method is comparable to the oracle in its selections and WER performance.Comment: Under review (INTERSPEECH 2022

arXiv.org e-Print Archive

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

Author: Andrusenko Andrei
Laptev Aleksandr
Matveev Yuri
Medennikov Ivan
Mitrofanov Anton
Podluzhny Ivan
Publication venue: 'MDPI AG'
Publication date: 12/03/2021
Field of study

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. Researchers and industry prefer to use end-to-end ASR systems for on-device speech recognition tasks. This is because end-to-end systems can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Another challenging task associated with speech assistants is personalization, which mainly lies in handling out-of-vocabulary (OOV) words. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. To address the aforementioned problems, we propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique. It non-deterministically tokenizes utterances to extend the token's contexts and to regularize their distribution for the model's recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative WER and 25% relative F-score) at no additional computational cost. Owing to the use of BPE-dropout, our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER), which is close to the best published multilingual system.Comment: 16 pages, 7 figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute