1,263 research outputs found

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

    Discrimination of Individual Tigers (\u3cem\u3ePanthera tigris\u3c/em\u3e) from Long Distance Roars

    Get PDF
    This paper investigates the extent of tiger (Panthera tigris) vocal individuality through both qualitative and quantitative approaches using long distance roars from six individual tigers at Omaha\u27s Henry Doorly Zoo in Omaha, NE. The framework for comparison across individuals includes statistical and discriminant function analysis across whole vocalization measures and statistical pattern classification using a hidden Markov model (HMM) with frame-based spectral features comprised of Greenwood frequency cepstral coefficients. Individual discrimination accuracy is evaluated as a function of spectral model complexity, represented by the number of mixtures in the underlying Gaussian mixture model (GMM), and temporal model complexity, represented by the number of sequential states in the HMM. Results indicate that the temporal pattern of the vocalization is the most significant factor in accurate discrimination. Overall baseline discrimination accuracy for this data set is about 70% using high level features without complex spectral or temporal models. Accuracy increases to about 80% when more complex spectral models (multiple mixture GMMs) are incorporated, and increases to a final accuracy of 90% when more detailed temporal models (10-state HMMs) are used. Classification accuracy is stable across a relatively wide range of configurations in terms of spectral and temporal model resolution

    Vocal Classification of Vocalizations of a Pair of Asian Small-Clawed Otters to Determine Stress

    Get PDF
    Asian Small-Clawed Otters (Aonyx cinerea) are a small, protected but threatened species living in freshwater. They are gregarious and live in monogamous pairs for their lifetimes, communicating via scent and acoustic vocalizations. This study utilized a hidden Markov model (HMM) to classify stress versus non-stress calls from a sibling pair under professional care. Vocalizations were expertly annotated by keepers into seven contextual categories. Four of these—aggression, separation anxiety, pain, and prefeeding—were identified as stressful contexts, and three of them—feeding, training, and play—were identified as non-stressful contexts. The vocalizations were segmented, manually categorized into broad vocal type call types, and analyzed to determine signal to noise ratios. From this information, vocalizations from the most common contextual categories were used to implement HMM-based automatic classification experiments, which included individual identification, stress vs non-stress, and individual context classification. Results indicate that both individual identity and stress vs non-stress were distinguishable, with accuracies above 90%, but that individual contexts within the stress category were not easily separable

    SPEAKER AND GENDER IDENTIFICATION USING BIOACOUSTIC DATA SETS

    Get PDF
    Acoustic analysis of animal vocalizations has been widely used to identify the presence of individual species, classify vocalizations, identify individuals, and determine gender. In this work automatic identification of speaker and gender of mice from ultrasonic vocalizations and speaker identification of meerkats from their Close calls is investigated. Feature extraction was implemented using Greenwood Function Cepstral Coefficients (GFCC), designed exclusively for extracting features from animal vocalizations. Mice ultrasonic vocalizations were analyzed using Gaussian Mixture Models (GMM) which yielded an accuracy of 78.3% for speaker identification and 93.2% for gender identification. Meerkat speaker identification with Close calls was implemented using Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM), with an accuracy of 90.8% and 94.4% respectively. The results obtained shows these methods indicate the presence of gender and identity information in vocalizations and support the possibility of robust gender identification and individual identification using bioacoustic data sets

    Stress and Emotion Classification Using Jitter and Shimmer Features

    Get PDF
    In this paper, we evaluate the use of appended jitter and shimmer speech features for the classification of human speaking styles and of animal vocalization arousal levels. Jitter and shimmer features are extracted from the fundamental frequency contour and added to baseline spectral features, specifically Mel-frequency cepstral coefficients (MFCCs) for human speech and Greenwood function cepstral coefficients (GFCCs) for animal vocalizations. Hidden Markov models (HMMs) with Gaussian mixture models (GMMs) state distributions are used for classification. The appended jitter and shimmer features result in an increase in classification accuracy for several illustrative datasets, including the SUSAS dataset for human speaking styles as well as vocalizations labeled by arousal level for African elephant and Rhesus monkey species

    COMMUNICATING IN SOCIAL NETWORKS: EFFECTS OF REVERBERATION ON ACOUSTIC INFORMATION TRANSFER IN THREE SPECIES OF BIRDS

    Get PDF
    In socially and acoustically complex environments the auditory system processes sounds that are distorted, attenuated and additionally masked by biotic and abiotic noise. As a result, spectral and temporal alterations of the sounds may affect the transfer of information between signalers and receivers in networks of conspecifics, increasing detection thresholds and interfering with the discrimination and recognition of sound sources. To this day, much concern has been directed toward anthropogenic noise sources and whether they affect the animals' natural territorial and reproductive behavior and ultimately harm the survival of the species. Not much is known, however, about the potentially synergistic effects of environmentally-induced sound degradation, masking from noise and competing sound signals, and what implications these interactions bear for vocally-mediated exchanges in animals. This dissertation describes a series of comparative, psychophysical experiments in controlled laboratory conditions to investigate the impact of reverberation on the perception of a range of artificial sounds and natural vocalizations in the budgerigar, canary, and zebra finch. Results suggest that even small reverberation effects could be used to gauge different acoustic environments and to locate a sound source but limit the vocally-mediated transfer of important information in social settings, especially when reverberation is paired with noise. Discrimination of similar vocalizations from different individuals is significantly impaired when both reverberation and abiotic noise levels are high, whereas this ability is hardly affected by either of these factors alone. Similarly, high levels of reverberation combined with biotic noise from signaling conspecifics limit the auditory system's ability to parse a complex acoustic scene by segregating signals from multiple individuals. Important interaction effects like these caused by the characteristics of the habitat and species differences in auditory sensitivity therefore can predict whether a given acoustic environment limits communication range or interferes with the detection, discrimination, and recognition of biologically important sounds

    Rhesus Monkeys See Who They Hear: Spontaneous Cross-Modal Memory for Familiar Conspecifics

    Get PDF
    Rhesus monkeys gather much of their knowledge of the social world through visual input and may preferentially represent this knowledge in the visual modality. Recognition of familiar faces is clearly advantageous, and the flexibility and utility of primate social memory would be greatly enhanced if visual memories could be accessed cross-modally either by visual or auditory stimulation. Such cross-modal access to visual memory would facilitate flexible retrieval of the knowledge necessary for adaptive social behavior. We tested whether rhesus monkeys have cross-modal access to visual memory for familiar conspecifics using a delayed matching-to-sample procedure. Monkeys learned visual matching of video clips of familiar individuals to photographs of those individuals, and generalized performance to novel videos. In crossmodal probe trials, coo-calls were played during the memory interval. The calls were either from the monkey just seen in the sample video clip or from a different familiar monkey. Even though the monkeys were trained exclusively in visual matching, the calls influenced choice by causing an increase in the proportion of errors to the picture of the monkey whose voice was heard on incongruent trials. This result demonstrates spontaneous cross-modal recognition. It also shows that viewing videos of familiar monkeys activates naturally formed memories of real monkeys, validating the use of video stimuli in studies of social cognition in monkeys
    • …
    corecore