209 research outputs found

    BAT: An open-source, web-based audio events annotation tool

    Get PDF
    In this paper we present BAT (BMAT Annotation Tool), an open-source, web-based tool for the manual annotation of events in audio recordings developed at BMAT (Barcelona Music and Audio Technologies). The main feature of the tool is that it provides an easy way to annotate the salience of simultaneous sound sources. Additionally, it allows to define multiple ontologies to adapt to multiple tasks and offers the possibility to cross-annotate audio data. Moreover, it is easy to install and deploy on servers. We carry out an evaluation where 3 annotators use BAT to annotate a small dataset composed of broadcast media recordings. The results of the experiments show that BAT offers fast an- notation mechanisms and a method to assign salience that produces high agreement among annotators

    Deconstructing Speech: new tools for speech manipulation

    Full text link
    My research at the London College of Communication is concerned with archives of recorded speech, what new tools need to be devised for its manipulation and how to go about this process of invention. Research into available forms of analysis of speech is discussed below with regard to two specific areas, feature vectors from linear predictive coding (LPC) analysis and hidden Markov-model-based automatic speech recognition (ASR) systems. These are discussed in order to demonstrate that whilst aspects of each may be useful in devising a system of speech-archive manipulation for artistic use. Their drawbacks and deficiencies for use in art – consequent of the reasons for their invention – necessitate the creation of tools with artistic, rather than engineering agendas in mind. It is through the initial process of devising conceptual tools for understanding speech as sound objects that I have been confronted with issues of semiotics and semantics of the voice and of the relationship between sound and meaning in speech, and of the role of analysis in mediating existing methods of communication. This is discussed with reference to Jean-Jacques Nattiez’s Music and Discourse: Towards a Semiology of Music (Nattiez 1987). The ‘trace’ – a neutral level of semiotic analysis proposed by Nattiez, far from being hypothetical as suggested by Hatten (1992: 88–98) and others, is present by analogy to many forms of mediation in modern spoken communication and the reproduction of music, and it is precisely this neutrality with regards to meaning that tools for manipulation of speech must possess, since the relationships between the sound of speech and its meaning are ‘intense’ (after Deleuze 1968

    Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

    Full text link
    In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimate-and-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10--50 ms), followed by a tracking stage based on dynamic programming or a linear state-space model. One of the main disadvantages of these approaches is that the tracking stage, however good it may be, cannot improve upon the formant estimation accuracy of the first stage. The proposed TVQCP method provides a single-stage formant tracking that combines the estimation and tracking stages into one. TVQCP analysis combines three approaches to improve formant estimation and tracking: (1) it uses temporally weighted quasi-closed-phase analysis to derive closed-phase estimates of the vocal tract with reduced interference from the excitation source, (2) it increases the residual sparsity by using the L1L_1 optimization and (3) it uses time-varying linear prediction analysis over long time windows (e.g., 100--200 ms) to impose a continuity constraint on the vocal tract model and hence on the formant trajectories. Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner). Matlab scripts for the proposed method can be found at: https://github.com/njaygowda/ftrac

    A Literature Review of Uses and Attitudes Towards the Acceptance of Assistive Real-Time Technology in the Voice Studio

    Get PDF
    The human voice is a tool that not only communicates ideas and feelings, but also expresses and elicits emotions. Since the age of Manual Garcia (1805-1906), the inventor of the laryngeal mirror, singers and voice teachers have discussed, argued and philosophized about the most efficient manner in which to train any voice to reach its optimum capability. Singing techniques and the subsequent voice qualities have been strongly divided by cultural and personal preferences. Empiricists carefully guarded their teaching methods and shunned ideas that any other fields might have anything worthwhile to contribute to voice instruction. Voice teachers have been slow to embrace the use of visual feedback that is generated by technological sources. A review of literature published since 1967 traces the trends of acceptance of assistive real-time technology in the voice studio

    SPPAS: a tool for the phonetic segmentations of Speech

    No full text
    International audienceSPPAS is a tool to produce automatic annotations which include utterance, word, syllabic and phonemic segmentations from a recorded speech sound and its transcription. SPPAS is distributed under the terms of the GNU Public License. It was successfully applied during the Evalita 2011 campaign, on Italian map-task dialogues. It can also deal with French, English and Chinese and there is an easy way to add other languages. The paper describes the development of resources and free tools, consisting of acoustic models, phonetic dictionaries, and libraries and programs to deal with these data. All of them are publicly available
    • …
    corecore