3,036 research outputs found

    Automatic voice recognition using traditional and artificial neural network approaches

    Get PDF
    The main objective of this research is to develop an algorithm for isolated-word recognition. This research is focused on digital signal analysis rather than linguistic analysis of speech. Features extraction is carried out by applying a Linear Predictive Coding (LPC) algorithm with order of 10. Continuous-word and speaker independent recognition will be considered in future study after accomplishing this isolated word research. To examine the similarity between the reference and the training sets, two approaches are explored. The first is implementing traditional pattern recognition techniques where a dynamic time warping algorithm is applied to align the two sets and calculate the probability of matching by measuring the Euclidean distance between the two sets. The second is implementing a backpropagation artificial neural net model with three layers as the pattern classifier. The adaptation rule implemented in this network is the generalized least mean square (LMS) rule. The first approach has been accomplished. A vocabulary of 50 words was selected and tested. The accuracy of the algorithm was found to be around 85 percent. The second approach is in progress at the present time

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

    Novel modeling of task versus rest brain state predictability using a dynamic time warping spectrum: comparisons and contrasts with other standard measures of brain dynamics

    Get PDF
    Dynamic time warping, or DTW, is a powerful and domain-general sequence alignment method for computing a similarity measure. Such dynamic programming-based techniques like DTW are now the backbone and driver of most bioinformatics methods and discoveries. In neuroscience it has had far less use, though this has begun to change. We wanted to explore new ways of applying DTW, not simply as a measure with which to cluster or compare similarity between features but in a conceptually different way. We have used DTW to provide a more interpretable spectral description of the data, compared to standard approaches such as the Fourier and related transforms. The DTW approach and standard discrete Fourier transform (DFT) are assessed against benchmark measures of neural dynamics. These include EEG microstates, EEG avalanches, and the sum squared error (SSE) from a multilayer perceptron (MLP) prediction of the EEG time series, and simultaneously acquired FMRI BOLD signal. We explored the relationships between these variables of interest in an EEG-FMRI dataset acquired during a standard cognitive task, which allowed us to explore how DTW differentially performs in different task settings. We found that despite strong correlations between DTW and DFT-spectra, DTW was a better predictor for almost every measure of brain dynamics. Using these DTW measures, we show that predictability is almost always higher in task than in rest states, which is consistent to other theoretical and empirical findings, providing additional evidence for the utility of the DTW approach

    A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)

    Full text link
    The alignment of heterogeneous sequential data (video to text) is an important and challenging problem. Standard techniques for this task, including Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from inherent drawbacks. Mainly, the Markov assumption implies that, given the immediate past, future alignment decisions are independent of further history. The separation between similarity computation and alignment decision also prevents end-to-end training. In this paper, we propose an end-to-end neural architecture where alignment actions are implemented as moving data between stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture supports a large variety of alignment tasks, including one-to-one, one-to-many, skipping unmatched elements, and (with extensions) non-monotonic alignment. Extensive experiments on semi-synthetic and real datasets show that our algorithm outperforms state-of-the-art baselines.Comment: Accepted at CVPR 2018 (Spotlight). arXiv file includes the paper and the supplemental materia

    A real-time human-robot interaction system based on gestures for assistive scenarios

    Get PDF
    Natural and intuitive human interaction with robotic systems is a key point to develop robots assisting people in an easy and effective way. In this paper, a Human Robot Interaction (HRI) system able to recognize gestures usually employed in human non-verbal communication is introduced, and an in-depth study of its usability is performed. The system deals with dynamic gestures such as waving or nodding which are recognized using a Dynamic Time Warping approach based on gesture specific features computed from depth maps. A static gesture consisting in pointing at an object is also recognized. The pointed location is then estimated in order to detect candidate objects the user may refer to. When the pointed object is unclear for the robot, a disambiguation procedure by means of either a verbal or gestural dialogue is performed. This skill would lead to the robot picking an object in behalf of the user, which could present difficulties to do it by itself. The overall system — which is composed by a NAO and Wifibot robots, a KinectTM v2 sensor and two laptops — is firstly evaluated in a structured lab setup. Then, a broad set of user tests has been completed, which allows to assess correct performance in terms of recognition rates, easiness of use and response times.Postprint (author's final draft
    • …
    corecore