40,986 research outputs found
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
We present the JVNV, a Japanese emotional speech corpus with verbal content
and nonverbal vocalizations whose scripts are generated by a large-scale
language model. Existing emotional speech corpora lack not only proper
emotional scripts but also nonverbal vocalizations (NVs) that are essential
expressions in spoken language to express emotions. We propose an automatic
script generation method to produce emotional scripts by providing seed words
with sentiment polarity and phrases of nonverbal vocalizations to ChatGPT using
prompt engineering. We select 514 scripts with balanced phoneme coverage from
the generated candidate scripts with the assistance of emotion confidence
scores and language fluency scores. We demonstrate the effectiveness of JVNV by
showing that JVNV has better phoneme coverage and emotion recognizability than
previous Japanese emotional speech corpora. We then benchmark JVNV on emotional
text-to-speech synthesis using discrete codes to represent NVs. We show that
there still exists a gap between the performance of synthesizing read-aloud
speech and emotional speech, and adding NVs in the speech makes the task even
harder, which brings new challenges for this task and makes JVNV a valuable
resource for relevant works in the future. To our best knowledge, JVNV is the
first speech corpus that generates scripts automatically using large language
models
AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis
Recently, sound recognition has been used to identify sounds, such as car and
river. However, sounds have nuances that may be better described by
adjective-noun pairs such as slow car, and verb-noun pairs such as flying
insects, which are under explored. Therefore, in this work we investigate the
relation between audio content and both adjective-noun pairs and verb-noun
pairs. Due to the lack of datasets with these kinds of annotations, we
collected and processed the AudioPairBank corpus consisting of a combined total
of 1,123 pairs and over 33,000 audio files. One contribution is the previously
unavailable documentation of the challenges and implications of collecting
audio recordings with these type of labels. A second contribution is to show
the degree of correlation between the audio content and the labels through
sound recognition experiments, which yielded results of 70% accuracy, hence
also providing a performance benchmark. The results and study in this paper
encourage further exploration of the nuances in audio and are meant to
complement similar research performed on images and text in multimedia
analysis.Comment: This paper is a revised version of "AudioSentibank: Large-scale
Semantic Ontology of Acoustic Concepts for Audio Content Analysis
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
- …