194 research outputs found

    Diversity-Robust Acoustic Feature Signatures Based on Multiscale Fractal Dimension for Similarity Search of Environmental Sounds

    Full text link
    This paper proposes new acoustic feature signatures based on the multiscale fractal dimension (MFD), which are robust against the diversity of environmental sounds, for the content-based similarity search. The diversity of sound sources and acoustic compositions is a typical feature of environmental sounds. Several acoustic features have been proposed for environmental sounds. Among them is the widely-used Mel-Frequency Cepstral Coefficients (MFCCs), which describes frequency-domain features. However, in addition to these features in the frequency domain, environmental sounds have other important features in the time domain with various time scales. In our previous paper, we proposed enhanced multiscale fractal dimension signature (EMFD) for environmental sounds. This paper extends EMFD by using the kernel density estimation method, which results in better performance of the similarity search tasks. Furthermore, it newly proposes another acoustic feature signature based on MFD, namely very-long-range multiscale fractal dimension signature (MFD-VL). The MFD-VL signature describes several features of the time-varying envelope for long periods of time. The MFD-VL signature has stability and robustness against background noise and small fluctuations in the parameters of sound sources, which are produced in field recordings. We discuss the effectiveness of these signatures in the similarity sound search by comparing with acoustic features proposed in the DCASE 2018 challenges. Due to the unique descriptiveness of our proposed signatures, we confirmed the signatures are effective when they are used with other acoustic features.Comment: 15 pages, 14 figure

    A Linear Hybrid Sound Generation of Musical Instruments using Temporal and Spectral Shape Features

    Get PDF
    The generation of a hybrid musical instrument sound using morphing has always been an area of great interest to the music world. The proposed method exploits the temporal and spectral shape features of the sound for this purpose. For an effective morphing the temporal and spectral features are found as they can capture the most perceptually salient dimensions of timbre perception, namely, the attack time and the distribution of spectral energy. A wide variety of sound synthesis algorithms is currently available. Sound synthesis methods have become more computationally efficient. Wave table synthesis is widely adopted by digital sampling instruments or samplers. The Over Lap Add method (OLA) refers to a family of algorithms that produce a signal by properly assembling a number of signal segments. In granular synthesis sound is considered as a sequence with overlaps of elementary acoustic elements called grains. The simplest morph is a cross-fade of amplitudes in the time domain which can be obtained through cross synthesis. A hybrid sound is generated with all these methods to find out which method gives the most linear morph. The result will be evaluated as an error measure which is the difference between the calculated and interpolated features. The extraction of morph in a perceptually pleasant manner is the ultimate requirement of the work. DOI: 10.17762/ijritcc2321-8169.16045

    A Survey of Evaluation in Music Genre Recognition

    Get PDF

    Diversity-Robust Acoustic Feature Signatures Based on Multiscale Fractal Dimension for Similarity Search of Environmental Sounds

    Get PDF
    This paper proposes new acoustic feature signatures based on the multiscale fractal dimension (MFD), which are robust against the diversity of environmental sounds, for the content-based similarity search. The diversity of sound sources and acoustic compositions is a typical feature of environmental sounds. Several acoustic features have been proposed for environmental sounds. Among them is the widely-used Mel-Frequency Cepstral Coefficients (MFCCs), which describes frequency-domain features. However, in addition to these features in the frequency domain, environmental sounds have other important features in the time domain with various time scales. In our previous paper, we proposed enhanced multiscale fractal dimension signature (EMFD) for environmental sounds. This paper extends EMFD by using the kernel density estimation method, which results in better performance of the similarity search tasks. Furthermore, it newly proposes another acoustic feature signature based on MFD, namely very-long-range multiscale fractal dimension signature (MFD-VL). The MFD-VL signature describes several features of the time-varying envelope for long periods of time. The MFD-VL signature has stability and robustness against background noise and small fluctuations in the parameters of sound sources, which are produced in field recordings. We discuss the effectiveness of these signatures in the similarity sound search by comparing with acoustic features proposed in the DCASE 2018 challenges. Due to the unique descriptiveness of our proposed signatures, we confirmed the signatures are effective when they are used with other acoustic features.</p

    Fractal based speech recognition and synthesis

    Get PDF
    Transmitting a linguistic message is most often the primary purpose of speech com­munication and the recognition of this message by machine that would be most useful. This research consists of two major parts. The first part presents a novel and promis­ing approach for estimating the degree of recognition of speech phonemes and makes use of a new set of features based fractals. The main methods of computing the frac­tal dimension of speech signals are reviewed and a new speaker-independent speech recognition system developed at De Montfort University is described in detail. Fi­nally, a Least Square Method as well as a novel Neural Network algorithm is employed to derive the recognition performance of the speech data. The second part of this work studies the synthesis of speech words, which is based mainly on the fractal dimension to create natural sounding speech. The work shows that by careful use of the fractal dimension together with the phase of the speech signal to ensure consistent intonation contours, natural-sounding speech synthesis is achievable with word level speech. In order to extend the flexibility of this framework, we focused on the filtering and the compression of the phase to maintain and produce natural sounding speech. A ‘naturalness level’ is achieved as a result of the fractal characteristic used in the synthesis process. Finally, a novel speech synthesis system based on fractals developed at De Montfort University is discussed. Throughout our research simulation experiments were performed on continuous speech data available from the Texas Instrument Massachusetts institute of technology ( TIMIT) database, which is designed to provide the speech research community with a standarised corpus for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition system

    A Comprehensive Review on Audio based Musical Instrument Recognition: Human-Machine Interaction towards Industry 4.0

    Get PDF
    Over the last two decades, the application of machine technology has shifted from industrial to residential use. Further, advances in hardware and software sectors have led machine technology to its utmost application, the human-machine interaction, a multimodal communication. Multimodal communication refers to the integration of various modalities of information like speech, image, music, gesture, and facial expressions. Music is the non-verbal type of communication that humans often use to express their minds. Thus, Music Information Retrieval (MIR) has become a booming field of research and has gained a lot of interest from the academic community, music industry, and vast multimedia users. The problem in MIR is accessing and retrieving a specific type of music as demanded from the extensive music data. The most inherent problem in MIR is music classification. The essential MIR tasks are artist identification, genre classification, mood classification, music annotation, and instrument recognition. Among these, instrument recognition is a vital sub-task in MIR for various reasons, including retrieval of music information, sound source separation, and automatic music transcription. In recent past years, many researchers have reported different machine learning techniques for musical instrument recognition and proved some of them to be good ones. This article provides a systematic, comprehensive review of the advanced machine learning techniques used for musical instrument recognition. We have stressed on different audio feature descriptors of common choices of classifier learning used for musical instrument recognition. This review article emphasizes on the recent developments in music classification techniques and discusses a few associated future research problems

    Algorithmic Compositional Methods and their Role in Genesis: A Multi-Functional Real-Time Computer Music System

    Get PDF
    Algorithmic procedures have been applied in computer music systems to generate compositional products using conventional musical formalism, extensions of such musical formalism and extra-musical disciplines such as mathematical models. This research investigates the applicability of such algorithmic methodologies for real-time musical composition, culminating in Genesis, a multi-functional real-time computer music system written for Mac OS X in the SuperCollider object-oriented programming language, and contained in the accompanying DVD. Through an extensive graphical user interface, Genesis offers musicians the opportunity to explore the application of the sonic features of real-time sound-objects to designated generative processes via different models of interaction such as unsupervised musical composition by Genesis and networked control of external Genesis instances. As a result of the applied interactive, generative and analytical methods, Genesis forms a unique compositional process, with a compositional product that reflects the character of its interactions between the sonic features of real-time sound-objects and its selected algorithmic procedures. Within this thesis, the technologies involved in algorithmic methodologies used for compositional processes, and the concepts that define their constructs are described, with consequent detailing of their selection and application in Genesis, with audio examples of algorithmic compositional methods demonstrated on the accompanying DVD. To demonstrate the real-time compositional abilities of Genesis, free explorations with instrumentalists, along with studio recordings of the compositional processes available in Genesis are presented in audiovisual examples contained in the accompanying DVD. The evaluation of the Genesis system’s capability to form a real-time compositional process, thereby maintaining real-time interaction between the sonic features of real-time sound objects and its selected algorithmic compositional methods, focuses on existing evaluation techniques founded in HCI and the qualitative issues such evaluation methods present. In terms of the compositional products generated by Genesis, the challenges in quantifying and qualifying its compositional outputs are identified, demonstrating the intricacies of assessing generative methods of compositional processes, and their impact on a resulting compositional product. The thesis concludes by considering further advances and applications of Genesis, and inviting further dissemination of the Genesis system and promotion of research into evaluative methods of generative techniques, with the hope that this may provide additional insight into the relative success of products generated by real-time algorithmic compositional processes

    Biomedical Applications of the Discrete Wavelet Transform

    Get PDF
    corecore