1,110 research outputs found

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

    Geometric deep learning: going beyond Euclidean data

    Get PDF
    Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure, and in cases where the invariances of these structures are built into networks used to model them. Geometric deep learning is an umbrella term for emerging techniques attempting to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds. The purpose of this paper is to overview different examples of geometric deep learning problems and present available solutions, key difficulties, applications, and future research directions in this nascent field

    Simulating dysarthric speech for training data augmentation in clinical speech applications

    Full text link
    Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. We evaluate the efficacy of our approach using both objective and subjective criteria. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by about 10% after data augmentation.Comment: Will appear in Proc. of ICASSP 201

    Classification of pig calls produced from birth to slaughter according to their emotional valence and context of production

    Get PDF
    Vocal expression of emotions has been observed across species and could provide a non-invasive and reliable means to assess animal emotions. We investigated if pig vocal indicators of emotions revealed in previous studies are valid across call types and contexts, and could potentially be used to develop an automated emotion monitoring tool. We performed an analysis of an extensive and unique dataset of low (LF) and high frequency (HF) calls emitted by pigs across numerous commercial contexts from birth to slaughter (7414 calls from 411 pigs). Our results revealed that the valence attributed to the contexts of production (positive versus negative) affected all investigated parameters in both LF and HF. Similarly, the context category affected all parameters. We then tested two different automated methods for call classification; a neural network revealed much higher classification accuracy compared to a permuted discriminant function analysis (pDFA), both for the valence (neural network: 91.5%; pDFA analysis weighted average across LF and HF (cross-classified): 61.7% with a chance level at 50.5%) and context (neural network: 81.5%; pDFA analysis weighted average across LF and HF (cross-classified): 19.4% with a chance level at 14.3%). These results suggest that an automated recognition system can be developed to monitor pig welfare on-farm.publishedVersio

    Joint 1D and 2D Neural Networks for Automatic Modulation Recognition

    Get PDF
    The digital communication and radar community has recently manifested more interest in using data-driven approaches for tasks such as modulation recognition, channel estimation and distortion correction. In this research we seek to apply an object detector for parameter estimation to perform waveform separation in the time and frequency domain prior to classification. This enables the full automation of detecting and classifying simultaneously occurring waveforms. We leverage a lD ResNet implemented by O\u27Shea et al. in [1] and the YOLO v3 object detector designed by Redmon et al. in [2]. We conducted an in depth study of the performance of these architectures and integrated the models to perform joint detection and classification. To our knowledge, the present research is the first to study and successfully combine a lD ResNet classifier and Yolo v3 object detector to fully automate the process of AMR for parameter estimation, pulse extraction and waveform classification for non-cooperative scenarios. The overall performance of the joint detector/ classifier is 90 at 10 dB signal to noise ratio for 24 digital and analog modulations
    • …
    corecore