4,200 research outputs found
Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities
Virtual assistants are becoming increasingly important speech-driven
Information Retrieval platforms that assist users with various tasks.
We discuss open problems and challenges with respect to modeling spoken
information queries for virtual assistants, and list opportunities where
Information Retrieval methods and research can be applied to improve the
quality of virtual assistant speech recognition.
We discuss how query domain classification, knowledge graphs and user
interaction data, and query personalization can be helpful to improve the
accurate recognition of spoken information domain queries. Finally, we also
provide a brief overview of current problems and challenges in speech
recognition.Comment: SIGIR '23. The 46th International ACM SIGIR Conference on Research &
Development in Information Retrieva
MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones
We describe a comprehensive methodology for developing user-voice
personalized automatic speech recognition (ASR) models by effectively training
models on mobile phones, allowing user data and models to be stored and used
locally. To achieve this, we propose a resource-aware sub-model-based training
approach that considers the RAM, and battery capabilities of mobile phones. By
considering the evaluation metric and resource constraints of the mobile
phones, we are able to perform efficient training and halt the process
accordingly. To simulate real users, we use speakers with various accents. The
entire on-device training and evaluation framework was then tested on various
mobile phones across brands. We show that fine-tuning the models and selecting
the right hyperparameter values is a trade-off between the lowest achievable
performance metric, on-device training time, and memory consumption. Overall,
our methodology offers a comprehensive solution for developing personalized ASR
models while leveraging the capabilities of mobile phones, and balancing the
need for accuracy with resource constraints.Comment: Accepted in AIMLSystems 202
Developing Deployable Spoken Language Translation Systems given Limited Resources
Approaches are presented that support the deployment of spoken language translation systems. Newly developed methods allow low cost portability to new language pairs. Proposed translation model pruning techniques achieve a high translation performance even in low memory situations. The named entity and specialty vocabulary coverage, particularly on small and mobile devices, is targeted to an individual user by translation model personalization
Designing Human-Centered Collective Intelligence
Human-Centered Collective Intelligence (HCCI) is an emergent research area that seeks to bring together major research areas like machine learning, statistical modeling, information retrieval, market research, and software engineering to address challenges pertaining to deriving intelligent insights and solutions through the collaboration of several intelligent sensors, devices and data sources. An archetypal contextual CI scenario might be concerned with deriving affect-driven intelligence through multimodal emotion detection sources in a bid to determine the likability of one movie trailer over another. On the other hand, the key tenets to designing robust and evolutionary software and infrastructure architecture models to address cross-cutting quality concerns is of keen interest in the “Cloud” age of today. Some of the key quality concerns of interest in CI scenarios span the gamut of security and privacy, scalability, performance, fault-tolerance, and reliability. I present recent advances in CI system design with a focus on highlighting optimal solutions for the aforementioned cross-cutting concerns. I also describe a number of design challenges and a framework that I have determined to be critical to designing CI systems. With inspiration from machine learning, computational advertising, ubiquitous computing, and sociable robotics, this literature incorporates theories and concepts from various viewpoints to empower the collective intelligence engine, ZOEI, to discover affective state and emotional intent across multiple mediums. The discerned affective state is used in recommender systems among others to support content personalization. I dive into the design of optimal architectures that allow humans and intelligent systems to work collectively to solve complex problems. I present an evaluation of various studies that leverage the ZOEI framework to design collective intelligence
- …