2,375 research outputs found

    Perceptual Continuity and Naturalness of Expressive Strength in Singing Voices Based on Speech Morphing

    Get PDF
    This paper experimentally shows the importance of perceptual continuity of the expressive strength in vocal timbre for natural change in vocal expression. In order to synthesize various and continuous expressive strengths with vocal timbre, we investigated gradually changing expressions by applying the STRAIGHT speech morphing algorithm to singing voices. Here, a singing voice without expression is used as the base of morphing, and singing voices with three different expressions are used as the target. Through statistical analyses of perceptual evaluations, we confirmed that the proposed morphing algorithm provides perceptual continuity of vocal timbre. Our results showed the following: (i) gradual strengths in absolute evaluations, and (ii) a perceptually linear strength provided by the calculation of corrected intervals of the morph ratio by the inverse (reciprocal) function of an equation that approximates the perceptual strength. Finally, we concluded that applying continuity was highly effective for achieving perceptual naturalness, judging from the results showing that (iii) our gradual transformation method can perform well for perceived naturalness

    Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

    Get PDF
    Identification of the musical instrument from a music piece is becoming area of interest for researchers in recent years. The system for identification of musical instrument from monophonic audio recording is basically performs three tasks: i) Pre-processing of inputted music signal; ii) Feature extraction from the music signal; iii) Classification. There are many methods to extract the audio features from an audio recording like Mel-frequency Cepstral Coefficients (MFCC), Linear Predictive Codes (LPC), Linear Predictive Cepstral Coefficients (LPCC), Perceptual Linear Predictive Coefficients (PLP), etc. The paper presents an idea to identify musical instruments from monophonic audio recordings by extracting MFCC features and timbre related audio descriptors. Further, three classifiers K-Nearest Neighbors (K-NN), Support Vector Machine (SVM) and Binary Tree Classifier (BT) are used to identify the musical instrument name by using feature vector generated in feature extraction process. The analysis is made by studying results obtained by all possible combinations of feature extraction methods and classifiers. Percentage accuracies for each combination are calculated to find out which combinations can give better musical instrument identification results. The system gives higher percentage accuracies of 90.00%, 77.00% and 75.33% for five, ten and fifteen musical instruments respectively if MFCC is used with K-NN classifier and for Timbral ADs higher percentage accuracies of 88.00%, 84.00% and 73.33% are obtained for five, ten and fifteen musical instruments respectively if BT classifier is used. DOI: 10.17762/ijritcc2321-8169.150713

    Auditory perceptual assessment of voices: Examining perceptual ratings as a function of voice experience

    Get PDF
    Understanding voice usage is vital to our understanding of human interaction. What is known about the auditory perceptual evaluation of voices comes mainly from studies of voice professionals, who evaluate operatic/lyrical singing in specific contexts. This is surprising as recordings of singing voices from different musical styles are an omnipresent phenomenon, evoking reactions in listeners with various levels of expertise. Understanding how untrained listeners perceive and describe voices will open up new research possibilities and enhance vocal communication between listeners. Here three studies with a mixed-methods approach aimed at: (1) evaluating the ability of untrained listeners to describe voices, and (2) determining what auditory features were most salient in participants’ discrimination of voices. In an interview (N = 20) and a questionnaire study (N = 48), free voice descriptions by untrained listeners of 23 singing voices primarily from popular music were compared with terms used by voice professionals, revealing that participants were able to describe voices using vocal characteristics from essential categories indicating sound quality, pitch changes, articulation, and variability in expression. Nine items were derived and used in an online survey for the evaluation of six voices by trained and untrained listeners in a German (N = 216) and an English (N = 50) sample, revealing that neither language nor expertise affected the assessment of the singers. A discriminant analysis showed that roughness and tension were important features for voice discrimination. The measurement of vocal expression created in the current study will be informative for studying voice perception and evaluation more generally

    On the Hungarian sung vowels

    Get PDF
    Singing at a very high pitch is associated with vocal tract adjustments in professional western operatic singing. However, as of yet there is an inadequate amount of data available on the extent of the acoustic transformation the Hungarian vowels undergo during singing. The author’s purpose is to evaluate the acoustic and articulatory changes of Hungarian vowel qualities, and examine the effect of these changes on the intelligibility of sounds, which has not yet been done for Hungarian. The paper contains a brief summary of formerly described tendencies for other languages and data for Hungarian from pilot studies carried out by the author with an adult soprano’s and a child’s sung vowels

    Speaker Identification for Swiss German with Spectral and Rhythm Features

    Get PDF
    We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity

    Temporal Coding of Periodicity Pitch in the Auditory System: An Overview

    Get PDF
    This paper outlines a taxonomy of neural pulse codes and reviews neurophysiological evidence for interspike interval-based representations for pitch and timbre in the auditory nerve and cochlear nucleus. Neural pulse codes can be divided into channel-based codes, temporal-pattern codes, and time-of-arrival codes. Timings of discharges in auditory nerve fibers reflect the time structure of acoustic waveforms, such that the interspike intervals that are produced precisely convey information concerning stimulus periodicities. Population-wide inter-spike interval distributions are constructed by summing together intervals from the observed responses of many single Type I auditory nerve fibers. Features in such distributions correspond closely with pitches that are heard by human listeners. The most common all-order interval present in the auditory nerve array almost invariably corresponds to the pitch frequency, whereas the relative fraction of pitchrelated intervals amongst all others qualitatively corresponds to the strength of the pitch. Consequently, many diverse aspects of pitch perception are explained in terms of such temporal representations. Similar stimulus-driven temporal discharge patterns are observed in major neuronal populations of the cochlear nucleus. Population-interval distributions constitute an alternative time-domain strategy for representing sensory information that complements spatially organized sensory maps. Similar autocorrelation-like representations are possible in other sensory systems, in which neural discharges are time-locked to stimulus waveforms

    Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data

    Full text link
    Models for music artist classification usually were operated in the frequency domain, in which the input audio samples are processed by the spectral transformation. The WaveNet architecture, originally designed for speech and music generation. In this paper, we propose an end-to-end architecture in the time domain for this task. A WaveNet classifier was introduced which directly models the features from a raw audio waveform. The WaveNet takes the waveform as the input and several downsampling layers are subsequent to discriminate which artist the input belongs to. In addition, the proposed method is applied to singer identification. The model achieving the best performance obtains an average F1 score of 0.854 on benchmark dataset of Artist20, which is a significant improvement over the related works. In order to show the effectiveness of feature learning of the proposed method, the bottleneck layer of the model is visualized.Comment: 12 page
    • …
    corecore