6 research outputs found

    Identifikasi Arti Tangisan Bayi Versi Dunstan Baby Language Menggunakan Jarak Terpendek Dari Jarak Mahalanobis (Infant Cries Identification of Dunstan Baby Language Version using the Shortest Distance of Mahalanobis)

    Get PDF
    New born babies have the ability to express their basic needs through sounds. A system to understand the meaning of crying infants of aged 0-3 months is called Dunstan Baby Language (DBL), which was introduced in 2006. This research aimed to perform the modeling of codebook method with k-means clustering technique as feature matching, and Mel Frequency Cepstrum Coefficients (MFCC) as feature extraction to identify the infant cries. The infant cries identification of Dunstan Baby Language version used the shortest distance of mahalanobis. The treatment in this research was the combination of frame length: 25 ms, 40 ms and 60 ms, frame overlap of 0%, 40%, and 60%, and the number of codewords (number of clusters) of 1 to 29. The best accuracy in recognizing five types of crying Infants using mahalanobis distance can be achieved up to 83% when the frame length = 275, the overlap frame = 0.25, and the k = 17. Sound ‘heh’ was the most familiar, whereas sound ‘owh’ was always missunderstood and generally  known as ‘neh’ and ‘eairh’.Keywords: Codebook, Dunstan baby language, Mahalanobis Distance, MFC

    Machine Learning-based Classification of Birds through Birdsong

    Full text link
    Audio sound recognition and classification is used for many tasks and applications including human voice recognition, music recognition and audio tagging. In this paper we apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify (Australian) birds from publicly available audio files of their birdsong. We present approaches used for data processing and augmentation and compare the results of various state of the art machine learning models. We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study. Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed
    corecore