312,375 research outputs found

    Automatic Classification and Speaker Identification of African Elephant (\u3cem\u3eLoxodonta africana\u3c/em\u3e) Vocalizations

    Get PDF
    A hidden Markov model (HMM) system is presented for automatically classifying African elephant vocalizations. The development of the system is motivated by successful models from human speech analysis and recognition. Classification features include frequency-shifted Mel-frequency cepstral coefficients (MFCCs) and log energy, spectrally motivated features which are commonly used in human speech processing. Experiments, including vocalization type classification and speaker identification, are performed on vocalizations collected from captive elephants in a naturalistic environment. The system classified vocalizations with accuracies of 94.3% and 82.5% for type classification and speaker identification classification experiments, respectively. Classification accuracy, statistical significance tests on the model parameters, and qualitative analysis support the effectiveness and robustness of this approach for vocalization analysis in nonhuman species

    Bond graph based sensitivity and uncertainty analysis modelling for micro-scale multiphysics robust engineering design

    Get PDF
    Components within micro-scale engineering systems are often at the limits of commercial miniaturization and this can cause unexpected behavior and variation in performance. As such, modelling and analysis of system robustness plays an important role in product development. Here schematic bond graphs are used as a front end in a sensitivity analysis based strategy for modelling robustness in multiphysics micro-scale engineering systems. As an example, the analysis is applied to a behind-the-ear (BTE) hearing aid. By using bond graphs to model power flow through components within different physical domains of the hearing aid, a set of differential equations to describe the system dynamics is collated. Based on these equations, sensitivity analysis calculations are used to approximately model the nature and the sources of output uncertainty during system operation. These calculations represent a robustness evaluation of the current hearing aid design and offer a means of identifying potential for improved designs of multiphysics systems by way of key parameter identification

    Evolutionary-based sparse regression for the experimental identification of duffing oscillator

    Get PDF
    In this paper, an evolutionary-based sparse regression algorithm is proposed and applied onto experimental data collected from a Duffing oscillator setup and numerical simulation data. Our purpose is to identify the Coulomb friction terms as part of the ordinary differential equation of the system. Correct identification of this nonlinear system using sparse identification is hugely dependent on selecting the correct form of nonlinearity included in the function library. Consequently, in this work, the evolutionary-based sparse identification is replacing the need for user knowledge when constructing the library in sparse identification. Constructing the library based on the data-driven evolutionary approach is an effective way to extend the space of nonlinear functions, allowing for the sparse regression to be applied on an extensive space of functions. The results show that the method provides an effective algorithm for the purpose of unveiling the physical nature of the Duffing oscillator. In addition, the robustness of the identification algorithm is investigated for various levels of noise in simulation. The proposed method has possible applications to other nonlinear dynamic systems in mechatronics, robotics, and electronics

    Robust and Efficient Recovery of Rigid Motion from Subspace Constraints Solved using Recursive Identification of Nonlinear Implicit Systems

    Get PDF
    The problem of estimating rigid motion from projections may be characterized using a nonlinear dynamical system, composed of the rigid motion transformation and the perspective map. The time derivative of the output of such a system, which is also called the "motion field", is bilinear in the motion parameters, and may be used to specify a subspace constraint on either the direction of translation or the inverse depth of the observed points. Estimating motion may then be formulated as an optimization task constrained on such a subspace. Heeger and Jepson [5], who first introduced this constraint, solve the optimization task using an extensive search over the possible directions of translation. We reformulate the optimization problem in a systems theoretic framework as the the identification of a dynamic system in exterior differential form with parameters on a differentiable manifold, and use techniques which pertain to nonlinear estimation and identification theory to perform the optimization task in a principled manner. The general technique for addressing such identification problems [14] has been used successfully in addressing other problems in computational vision [13, 12]. The application of the general method [14] results in a recursive and pseudo-optimal solution of the motion problem, which has robustness properties far superior to other existing techniques we have implemented. By releasing the constraint that the visible points lie in front of the observer, we may explain some psychophysical effects on the nonrigid percept of rigidly moving shapes. Experiments on real and synthetic image sequences show very promising results in terms of robustness, accuracy and computational efficiency

    Look, Listen and Learn - A Multimodal LSTM for Speaker Identification

    Full text link
    Speaker identification refers to the task of localizing the face of a person who has the same identity as the ongoing voice in a video. This task not only requires collective perception over both visual and auditory signals, the robustness to handle severe quality degradations and unconstrained content variations are also indispensable. In this paper, we describe a novel multimodal Long Short-Term Memory (LSTM) architecture which seamlessly unifies both visual and auditory modalities from the beginning of each sequence input. The key idea is to extend the conventional LSTM by not only sharing weights across time steps, but also sharing weights across modalities. We show that modeling the temporal dependency across face and voice can significantly improve the robustness to content quality degradations and variations. We also found that our multimodal LSTM is robustness to distractors, namely the non-speaking identities. We applied our multimodal LSTM to The Big Bang Theory dataset and showed that our system outperforms the state-of-the-art systems in speaker identification with lower false alarm rate and higher recognition accuracy.Comment: The 30th AAAI Conference on Artificial Intelligence (AAAI-16
    • …
    corecore