450 research outputs found
Grand Challenges in Music Information Research
This paper discusses some grand challenges in which music information research will impact our daily lives and our society in the future. Here, some fundamental questions are how to provide the best music for each person, how to predict music trends, how to enrich human-music relationships, how to evolve new music, and how to address environmental, energy issues by using music technologies. Our goal is to increase both attractiveness and social impacts of music information research in the future through such discussions and developments
Lyrics-to-Audio Alignment and its Application
Automatic lyrics-to-audio alignment techniques have been drawing attention in the last years and various studies have been made in this field. The objective of lyrics-to-audio alignment is to estimate a temporal relationship between lyrics and musical audio signals and can be applied to various applications such as Karaoke-style lyrics display. In this contribution, we provide an overview of recent development in this research topic, where we put a particular focus on categorization of various methods and on applications
IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models
Recent text-to-audio generation techniques have the potential to allow novice
users to freely generate music audio. Even if they do not have musical
knowledge, such as about chord progressions and instruments, users can try
various text prompts to generate audio. However, compared to the image domain,
gaining a clear understanding of the space of possible music audios is
difficult because users cannot listen to the variations of the generated audios
simultaneously. We therefore facilitate users in exploring not only text
prompts but also audio priors that constrain the text-to-audio music generation
process. This dual-sided exploration enables users to discern the impact of
different text prompts and audio priors on the generation results through
iterative comparison of them. Our developed interface, IteraTTA, is
specifically designed to aid users in refining text prompts and selecting
favorable audio priors from the generated audios. With this, users can
progressively reach their loosely-specified goals while understanding and
exploring the space of possible results. Our implementation and discussions
highlight design considerations that are specifically required for
text-to-audio models and how interaction techniques can contribute to their
effectiveness.Comment: Accepted to the 24th International Society for Music Information
Retrieval Conference (ISMIR 2023
CatAlyst: Domain-Extensible Intervention for Preventing Task Procrastination Using Large Generative Models
CatAlyst uses generative models to help workers' progress by influencing
their task engagement instead of directly contributing to their task outputs.
It prompts distracted workers to resume their tasks by generating a
continuation of their work and presenting it as an intervention that is more
context-aware than conventional (predetermined) feedback. The prompt can
function by drawing their interest and lowering the hurdle for resumption even
when the generated continuation is insufficient to substitute their work, while
recent human-AI collaboration research aiming at work substitution depends on a
stable high accuracy. This frees CatAlyst from domain-specific model-tuning and
makes it applicable to various tasks. Our studies involving writing and
slide-editing tasks demonstrated CatAlyst's effectiveness in helping workers
swiftly resume tasks with a lowered cognitive load. The results suggest a new
form of human-AI collaboration where large generative models publicly available
but imperfect for each individual domain can contribute to workers' digital
well-being.Comment: Accepted by ACM CHI Conference on Human Factors in Computing Systems
(CHI '23
- …