Search CORE

450 research outputs found

Grand Challenges in Music Information Research

Author: Goto Masataka
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

This paper discusses some grand challenges in which music information research will impact our daily lives and our society in the future. Here, some fundamental questions are how to provide the best music for each person, how to predict music trends, how to enrich human-music relationships, how to evolve new music, and how to address environmental, energy issues by using music technologies. Our goal is to increase both attractiveness and social impacts of music information research in the future through such discussions and developments

DROPS Dagstuhl Research Online Publication Server

Lyrics-to-Audio Alignment and its Application

Author: Fujihara Hiromasa
Goto Masataka
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

Automatic lyrics-to-audio alignment techniques have been drawing attention in the last years and various studies have been made in this field. The objective of lyrics-to-audio alignment is to estimate a temporal relationship between lyrics and musical audio signals and can be applied to various applications such as Karaoke-style lyrics display. In this contribution, we provide an overview of recent development in this research topic, where we put a particular focus on categorization of various methods and on applications

CiteSeerX

DROPS Dagstuhl Research Online Publication Server

IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models

Author: Goto Masataka
Yakura Hiromu
Publication venue
Publication date: 24/07/2023
Field of study

Recent text-to-audio generation techniques have the potential to allow novice users to freely generate music audio. Even if they do not have musical knowledge, such as about chord progressions and instruments, users can try various text prompts to generate audio. However, compared to the image domain, gaining a clear understanding of the space of possible music audios is difficult because users cannot listen to the variations of the generated audios simultaneously. We therefore facilitate users in exploring not only text prompts but also audio priors that constrain the text-to-audio music generation process. This dual-sided exploration enables users to discern the impact of different text prompts and audio priors on the generation results through iterative comparison of them. Our developed interface, IteraTTA, is specifically designed to aid users in refining text prompts and selecting favorable audio priors from the generated audios. With this, users can progressively reach their loosely-specified goals while understanding and exploring the space of possible results. Our implementation and discussions highlight design considerations that are specifically required for text-to-audio models and how interaction techniques can contribute to their effectiveness.Comment: Accepted to the 24th International Society for Music Information Retrieval Conference (ISMIR 2023

arXiv.org e-Print Archive

PodCastle: A Spoken Document Retrieval Service Improved by Anonymous User Contributions

Author: Goto Masataka
Ogata Jun
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

CatAlyst: Domain-Extensible Intervention for Preventing Task Procrastination Using Large Generative Models

Author: Arakawa Riku
Goto Masataka
Yakura Hiromu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/03/2023
Field of study

CatAlyst uses generative models to help workers' progress by influencing their task engagement instead of directly contributing to their task outputs. It prompts distracted workers to resume their tasks by generating a continuation of their work and presenting it as an intervention that is more context-aware than conventional (predetermined) feedback. The prompt can function by drawing their interest and lowering the hurdle for resumption even when the generated continuation is insufficient to substitute their work, while recent human-AI collaboration research aiming at work substitution depends on a stable high accuracy. This frees CatAlyst from domain-specific model-tuning and makes it applicable to various tasks. Our studies involving writing and slide-editing tasks demonstrated CatAlyst's effectiveness in helping workers swiftly resume tasks with a lowered cognitive load. The results suggest a new form of human-AI collaboration where large generative models publicly available but imperfect for each individual domain can contribute to workers' digital well-being.Comment: Accepted by ACM CHI Conference on Human Factors in Computing Systems (CHI '23

arXiv.org e-Print Archive

AutoRhythmGuitar: Computer-aided Composition for Rhythm Guitar in the Tab Space

Author: Masataka Goto
Matt McVicar
Satoru Fukayama
Publication venue
Publication date
Field of study

(Abstract to follow

CiteSeerX

ZENODO

University of Michigan Library Digital Collections

Modeling Structural Topic Transitions for Automatic Lyrics Generation

Author: Goto Masataka
Inui Kentaro
Matsubayashi Yuichiroh
Watanabe Kento
Publication venue: Department of Linguistics, Faculty of Arts, Chulalongkorn University
Publication date: 01/01/2014
Field of study

Waseda University Repository