2,094 research outputs found

    Real-time audio interaction in serious games for music learning

    Get PDF
    In this LBD, we present several Apps for playing while learning music or for learning music while playing. The core of all the games is based on the good performance of the real-time audio interaction algorithms developed by the ATIC group at Universidad de Ma ́laga (SPAIN).Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work has been funded by the Ministerio de Econom ıa y Competitividad of the Spanish Government under Project No. TIN2013-47276-C6-2-R and by the Junta de Andalucía under Project No. P11-TIC-7154

    Modeling spectral changes in singing voice for pitch modification

    Get PDF
    We present an advanced method to achieve natural modifications when applying a pitch shifting process to singing voice by modifying the spectral envelope of the audio ex- cerpt. To this end, an all-pole spectral envelope model has been selected to describe the global variations of the spectral envelope with the changes of the pitch. We performed a pitch shifting process of some sustained vowels with the envelope processing and without it, and compared both by means of a survey open to volunteers in our website.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work has been funded by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2013-47276-C6-2-R and by the Junta de Andalucía under Project No. P11-TIC-7154

    Conducting a virtual ensemble with a kinect device

    Get PDF
    This paper presents a gesture-based interaction technique for the implementation of an orchestra conductor and a virtual ensemble, using a 3D camera-based sensor to capture user’s gestures. In particular, a human-computer interface has been developed to recognize conducting gestures using a Microsoft Kinect device. The system allows the conductor to control both the tempo in the piece played as well as the dynamics of each instrument set independently. In order to modify the tempo in the playback, a time-frequency processing-based algorithmis used. Finally, an experiment was conducted to assess user’s opinion of the system as well as experimentally confirm if the features in the system were effectively improving user experience or not.This work has been funded by the Ministerio de Economia y Competitividad of the Spanish Government under Project No. TIN2010-21089-C03-02 and Project No. IPT-2011-0885-430000 and by the Junta de Andalucia under Project No. P11-TIC-7154. The work has been done at Universidad de Malaga. Campus de Excelencia Internacional Andalucia Tech

    Music Learning Tools for Android Devices

    Get PDF
    In this paper, a musical learning application for mobile devices is presented. The main objective is to design and develop an application capable of offering exercises to practice and improve a selection of music skills, to users interested in music learning and training. The selected music skills are rhythm, melodic dictation and singing. The application includes an audio signal analysis system implemented making use of the Goertzel algorithm which is employed in singing exercises to check if the user sings the right musical note. This application also includes a graphical interface to represent musical symbols. A set of tests were conducted to check the usefulness of the application as musical learning tool. A group of users with different music knowledge have tested the system and reported to have found it effective, easy and accessible.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Low-cost step aerobics system with virtual aerobics trainer

    Get PDF
    In this paper a low-cost step-aerobics instructor simulation system is presented. The proposed system analyses a given song to iden- tify its rhythmic pattern. Subsequently, this rhythmic pattern is used in order to issue a set of steps-aerobics commands to the user, thus simu- lating a training session. The system uses a Wii Balance Board to track exercises performed by users and runs on an Android smartphone. A set of tests were conducted to assess user experience and opinion on the system developed.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Parametric model of spectral envelope to synthesize realistic intensity variations in singing voice

    Get PDF
    2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 634 - 638In this paper, we propose a method to synthesize the natural variations of spectral envelope as intensity varies in singing voice. To this end, we propose a parametric model of spectral envelope based on novel 4-pole resonators as formant filters. This model has been used to analyse 60 vowels sung at different intensities in order to define a set of functions describing the global variations of parameters along intensity. These functions have been used to modify the intensity of 16 recorded vowels and 8 synthetic vowels generated with Vocaloid. The realism of the transformations performed with our approach has been evaluated by four amateurmusicians in comparison to Melodyne for real sounds and to Vocaloid for synthetic sounds. The proposed approach has been proved to achieve more realistic sounds than Melodyne and Vocaloid, especially for loud-to-weak transformations.This work has been funded by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2010-21089-C03- 02 and Project No. IPT-2011-0885-430000, by the Junta de Andaluc´ıa under Project No. P11-TIC-7154 and by the Ministerio de Educaci ón, Cultura y Deporte through the ‘Programa Nacional de Movilidad de Recursos Humanos del Plan Nacional de I-D+i 2008-2011, prorrogado por Acuerdo de Consejo de Ministros de 7 de octubre de 2011’. The work has been done in the context of Campus de Excelencia Internacional Andalucía Tech, Universidad de Málaga

    Android App for Recreating old Recording Sound Effects for Voice

    Get PDF
    An Android App for recreating old-recording sound effects for voice is presented. The old-recording sound effects recreated are: Vinyl effect, Cylinder effect and Tape effect. Also, two nonlinear audio effects: Tube effect and Overdrive effect, are implemented since analog recording and reproduction are based on nonlinear signal processing.This work has been funded by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2016-75866-C3-2-R. This work has been done at Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech

    A database and digital signal processing framework for the perceptual analysis of voice quality

    Get PDF
    Bermúdez de Alvear RM, Corral J, Tardón LJ, Barbancho AM, Fernández Contreras E, Rando Márquez S, Martínez-Arquero AG, Barbancho I. A database and digital signal processing framework for the perceptual analysis of voice quality. Pan European Voice Conferenc: PEVOC 11 Abstract Book. Aug. 31-Sept.2, 2015.Introduction. Clinical assessment of dysphonia relies on perceptual as much as instrumental methods of analysis [1]. The perceptual auditory analysis is potentially subject to several internal and external sources of bias [2]. Furthermore acoustic analyses which have been used to objectively characterize pathological voices are likely to be affected by confusion variables such as the signal processing or the hardware and software specifications [3]. For these reasons the poor correlation between perceptual ratings and acoustic measures remains to be a controversial matter [4]. The availability of annotated databases of voice samples is therefore of main importance for clinical and research purposes. Databases to perform digital processing of the vocal signal are usually built from English speaking subjects’ sustained vowels [5]. However phonemes vary from one language to another and to the best of our knowledge there are no annotated databases with Spanish sustained vowels from healthy or dysphonic voices. This work shows our first steps to fill in this gap. For the aim of aiding clinicians and researchers in the perceptual assessment of voice quality a two-fold objective was attained. On the one hand a database of healthy and disordered Spanish voices was developed; on the other an automatic analysis scheme was accomplished on the basis of signal processing algorithms and supervised learning machine techniques. Material and methods. A preliminary annotated database was created with 119 recordings of the sustained Spanish /a/; they were perceptually labeled by three experienced experts in vocal quality analysis. It is freely available under Links in the ATIC website (www.atic.uma.es). Voice signals were recorded using a headset condenser cardioid microphone (AKG C-544 L) positioned at 5 cm from the speaker’s mouth commissure. Speakers were instructed to sustain the Spanish vowel /a/ for 4 seconds. The microphone was connected to a digital recorder Edirol R-09HR. Voice signals were digitized at 16 bits with 44100 Hz sampling rate. Afterwards the initial and last 0.5 second segments were cut and the 3 sec. mid portion was selected for acoustic analysis. Sennheiser HD219 headphones were used by judges to perceptually evaluate voice samples. To label these recordings raters used the Grade-Roughness-Breathiness (GRB) perceptual scale which is a modified version of the original Hirano’s GRBAS scale, posteriorly modified by Dejonckere et al., [6]. In order to improve intra- and inter-raters’ agreement two types of modifications were introduced in the rating procedure, i.e. the 0-3 points scale resolution was increased by adding subintervals to the standard 0-3 intervals, and judges were provided with a written protocol with explicit definitions about the subintervals boundaries. By this way judges could compensate for the potential instability that might occur in their internal representations due to the perceptual context influence [7]. Raters’ perceptual evaluations were simultaneously performed by means of connecting the Sennheiser HD219 headphones to a multi-channel headphone preamp Behringer HA4700 Powerplay Pro-XL. The Yin algorithm [8] was selected as initial front-end to identify voiced frames and extract their fundamental frequency. For the digital processing of voice signals some conventional acoustic parameters [6] were selected. To complete the analysis the Mel-Frequency Cepstral Coefficients (MFCC) were further calculated because they are based on the auditory model and they are thus closer to the auditory system response than conventional features. Results. In the perceptual evaluation excellent intra-raters agreement and very good inter-raters agreement were achieved. During the supervised machine learning stage some conventional features were found to attain unexpected low performance in the classification scheme selected. Mel Frequency Cepstral Coefficients were promising for assorting samples with normal or quasi-normal voice quality. Discussion and conclusions. Despite it is still small and unbalanced the present annotated data base of voice samples can provide a basis for the development of other databases and automatic classification tools. Other authors [9, 10, 11] also found that modeling the auditory non-linear response during signal processing can help develop objective measures that better correspond with perceptual data. However highly disordered voices classification remains to be a challenge for this set of features since they cannot be correctly assorted by either conventional variables or the auditory model based measures. Current results warrant further research in order to find out the usability of other types of voice samples and features for the automatic classification schemes. Different digital processing steps could be used to improve the classifiers performance. Additionally other types of classifiers could be taken into account in future studies. Acknowledgment. This work was funded by the Spanish Ministerio de Economía y Competitividad, Project No. TIN2013-47276-C6-2-R has been done in the Campus de Excelencia Internacional Andalucía Tech, Universidad de Málaga. References [1] Carding PN, Wilson JA, MacKenzie K, Deary IJ. Measuring voice outcomes: state of the science review. The Journal of Laryngology and Otology 2009;123,8:823-829. [2] Oates J. Auditory-perceptual evaluation of disordered voice quality: pros, cons and future directions. Folia Phoniatrica et Logopaedica 2009;61,1:49-56. [3] Maryn et al. Meta-analysis on acoustic voice quality measures. J Acoust Soc Am 2009; 126, 5: 2619-2634. [4] Vaz Freitas et al. Correlation Between Acoustic and Audio-Perceptual Measures. J Voice 2015;29,3:390.e1 [5] “Multi-Dimensional Voice Program (MDVP) Model 5105. Software Instruction Manual”, Kay PENTAX, A Division of PENTAX Medical Company, 2 Bridgewater Lane, Lincoln Park, NJ 07035-1488 USA, November 2007. [6] Dejonckere PH, Bradley P, Clemente P, Cornut G, Crevier-Buchman L, Friedrich G, Van De Heyning P, Remacle M, Woisard V. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Guideline elaborated by the Comm. on Phoniatrics of the European Laryngological Society (ELS). Eur Arch Otorhinolaryngol 2001;258:77–82. [7] Kreiman et al. Voice Quality Perception. J Speech Hear Res 1993;36:21-4 [8] De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Amer. 202; 111,4:1917. [9] Shrivastav et al. Measuring breathiness. J Acoust Soc Am 2003;114,4:2217-2224. [10] Saenz-Lechon et al. Automatic Assessment of voice quality according to the GRBAS scale. Eng Med Biol Soc Ann 2006;1:2478-2481. [11] Fredouille et al. Back-and-forth methodology for objective voice quality assessment: from/to expert knowledge to/from automatic classification of dysphonia. EURASIP J Appl Si Pr 2009.Campus de Excelencia Internacional Andalucía Tech, Universidad de Málaga. Ministerio de Economía y Competitividad, Projecto No. TIN2013-47276-C6-2-R

    Estimation of the direction of strokes and arpegios

    Get PDF
    Whenever a chord is played in a musical instrument, the notes are not commonly played at the same time. Actually, in some instruments, it is impossible to trigger multiple notes simultaneously. In others, the player can consciously select the order of the sequence of notes to play to create a chord. In either case, the notes in the chord can be played very fast, and they can be played from the lowest to the highest pitch note (upstroke) or from the highest to the lowest pitch note (downstroke). In this paper, we describe a system to automatically estimate the direction of strokes and arpeggios from audio recordings. The proposed system is based on the analysis of the spectrogram to identify meaningful changes. In addition to the estimation of the up or down stroke direction, the proposed method provides information about the number of notes that constitute the chord, as well as the chord playing speed. The system has been tested with four different instruments: guitar, piano, autoharp and organ.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work has been funded by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2013-47276-C6-2-R, by the Junta de Andalucía under Project No. P11-TIC-7154 and by the Ministerio de Educación, Cultura y Deporte through the Programa Nacional de Movilidad de Recursos Humanos del Plan Nacional de I-D+i 2008- 2011

    Herramienta para mejorar la afinación vocal en Android

    Get PDF
    In this paper, a tool to improve vocal tuning in Android devices is presented. This application aims to offer exercises to practice and improve singing skills. The designed tool includes two main functionalities: sound synthesis, to provide with singing sound references, and fundamental frequency analysis, to analize the sound and check if the user sings the right musical note. The well-known Yin algorithm has been selected to perform the fundamental frequency analysis. Three different singing exercises are included: sing single notes, sing intervals and sing a note in order to complete a chord. The system also includes a graphical interface in which musical notation is employed to write down the singing sound. The system has been evaluated in order to test out its correct performance regarding both the analysis and synthesis of musical sounds.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
    corecore