We examine the speech modeling potential of generative spoken language
modeling (GSLM), which involves using learned symbols derived from data rather
than phonemes for speech analysis and synthesis. Since GSLM facilitates
textless spoken language processing, exploring its effectiveness is critical
for paving the way for novel paradigms in spoken-language processing. This
paper presents the findings of GSLM's encoding and decoding effectiveness at
the spoken-language and speech levels. Through speech resynthesis experiments,
we revealed that resynthesis errors occur at the levels ranging from phonology
to syntactics and GSLM frequently resynthesizes natural but content-altered
speech.Comment: Accepted to INTERSPEECH 202