39,334 research outputs found
Adapting an ASR Foundation Model for Spoken Language Assessment
A crucial part of an accurate and reliable spoken language assessment system
is the underlying ASR model. Recently, large-scale pre-trained ASR foundation
models such as Whisper have been made available. As the output of these models
is designed to be human readable, punctuation is added, numbers are presented
in Arabic numeric form and abbreviations are included. Additionally, these
models have a tendency to skip disfluencies and hesitations in the output.
Though useful for readability, these attributes are not helpful for assessing
the ability of a candidate and providing feedback. Here a precise transcription
of what a candidate said is needed. In this paper, we give a detailed analysis
of Whisper outputs and propose two solutions: fine-tuning and soft prompt
tuning. Experiments are conducted on both public speech corpora and an English
learner dataset. Results show that we can effectively alter the decoding
behaviour of Whisper to generate the exact words spoken in the response.Comment: Proceedings of SLaT
Analyzing the Targets of Hate in Online Social Media
Social media systems allow Internet users a congenial platform to freely
express their thoughts and opinions. Although this property represents
incredible and unique communication opportunities, it also brings along
important challenges. Online hate speech is an archetypal example of such
challenges. Despite its magnitude and scale, there is a significant gap in
understanding the nature of hate speech on social media. In this paper, we
provide the first of a kind systematic large scale measurement study of the
main targets of hate speech in online social media. To do that, we gather
traces from two social media systems: Whisper and Twitter. We then develop and
validate a methodology to identify hate speech on both these systems. Our
results identify online hate speech forms and offer a broader understanding of
the phenomenon, providing directions for prevention and detection approaches.Comment: Short paper, 4 pages, 4 table
Color and texture associations in voice-induced synesthesia
Voice-induced synesthesia, a form of synesthesia in which synesthetic perceptions are induced by the sounds of people's voices, appears to be relatively rare and has not been systematically studied. In this study we investigated the synesthetic color and visual texture perceptions experienced in response to different types of âvoice qualityâ (e.g., nasal, whisper, falsetto). Experiences of three different groupsâself-reported voice synesthetes, phoneticians, and controlsâwere compared using both qualitative and quantitative analysis in a study conducted online. Whilst, in the qualitative analysis, synesthetes used more color and texture terms to describe voices than either phoneticians or controls, only weak differences, and many similarities, between groups were found in the quantitative analysis. Notable consistent results between groups were the matching of higher speech fundamental frequencies with lighter and redder colors, the matching of âwhisperyâ voices with smoke-like textures, and the matching of âharshâ and âcreakyâ voices with textures resembling dry cracked soil. These data are discussed in the light of current thinking about definitions and categorizations of synesthesia, especially in cases where individuals apparently have a range of different synesthetic inducers
Faculty concert: Penelope Bitzas, Shiela Kibbe, and Eric Ruske, January 22, 1999
This is the concert program of the Faculty Concert of Penelope Bitzas, Shiela Kibbe, and Eric Ruske performance on Friday, January 22, 1999 at 8:00 p.m., at the Tsai Performance Center, 685 Commonwealth Avenue, Boston, Massachusetts. Works performed were E ingrato lo veggio, from Adriano in Siria by Giovanni Pergolesi; Confusa, smarrita, spiegarti, from Cantone by Baldassare Galuppi; Songs and Dances of Death by Modeste Mussorgsky; Gondoliera, Nimmer denkst du mein, and Der Traum der ersten Liebe by Heinrich Esser; Tonadillas by Enrique Granados; Lass from the Low Countree by John Jacob Niles; West London by Charles Ives; Why Don't You? by Lee Hoiby; and The Frog and the Snake by Irving Fine. Digitization for Boston University Concert Programs was supported by the Boston University Humanities Library Endowed Fund
[Review of] Silvester Brito. Red Cedar Warrior
Red Cedar Warrior, the collection of poems by S.J. Brito, is very obvious in its depiction of trepidations against Native Americans, in its mourning for the loss of culture and traditions, and its expression of anger. We easily see the obvious signs of Native Americanism in most of the poems included in his book. The warrior could not be anything other than Native American, astride a pony, feathered and painted. There are the drums, the ceremonial life, the peyote prayers, the shamans, and such references. We easily see the images and hear the voices that most let us know of the poet\u27s intent to share with us a Native American viewpoint. And why not? After all, Brito is a proud descendant of Comanches and Tarascans
Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech
Recognizing emotions in spoken communication is crucial for advanced
human-machine interaction. Current emotion detection methodologies often
display biases when applied cross-corpus. To address this, our study
amalgamates 16 diverse datasets, resulting in 375 hours of data across
languages like English, Chinese, and Japanese. We propose a soft labeling
system to capture gradational emotional intensities. Using the Whisper encoder
and data augmentation methods inspired by contrastive learning, our method
emphasizes the temporal dynamics of emotions. Our validation on four
multilingual datasets demonstrates notable zero-shot generalization. We publish
our open source model weights and initial promising results after fine-tuning
on Hume-Prosody.Comment: Accepted as talk at NeurIPS ML for Audio worksho
When the Bloom is on the Cotton Dixie Lee
VERSE 1Round my southern home the cotton fields are blooming,Far away the river glistens âneath the moon;Through the twilight comes the breath of clover blossoms,Far away the darkies sing a southern tune.Side by side down by the flowing stream we wandered,On your face the moonlight cast a golden glow;And I kissed away the tears when you were cryingAs I said âGoodbyeâ and whispered soft and low.
REFRAINWhen the bloom is on the cotton, Dixie Lee,Lifeâs sun will shine again for you and me;Iâll return to you once more, weâll be happy as of yoreWhen the bloom is on the cotton, Dixie Lee.
VERSE 2In my dreams tonight Iâm roaming with you Dixie,While the cotton fields are all abloom once more;I can see the soft moonlight upon the river,And your sweet face as you stroll along the shore.But we nevermore will wander down the pathway,As we did the night you gave your heart to me,For the tolling bells they tell the sad, and storyWhile the breezes whisper âFarewellâ Dixie Lee.
REFRAI
- âŠ