9,659 research outputs found

    Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces

    Get PDF
    This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a composite classifier using both cepstral and phase space features is developed. Results indicate that although the accuracy of the phase space approach by itself is still currently below that of baseline cepstral methods, a combined approach is capable of increasing speaker independent phoneme accuracy

    Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection

    Get PDF
    Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased "breathiness". Modelling and surrogate data studies have shown significant nonlinear and non-Gaussian random properties in these sounds. Nonetheless, existing tools are limited to analysing voices displaying near periodicity, and do not account for this inherent biophysical nonlinearity and non-Gaussian randomness, often using linear signal processing methods insensitive to these properties. They do not directly measure the two main biophysical symptoms of disorder: complex nonlinear aperiodicity, and turbulent, aeroacoustic, non-Gaussian randomness. Often these tools cannot be applied to more severe disordered voices, limiting their clinical usefulness.

Methods: This paper introduces two new tools to speech analysis: recurrence and fractal scaling, which overcome the range limitations of existing tools by addressing directly these two symptoms of disorder, together reproducing a "hoarseness" diagram. A simple bootstrapped classifier then uses these two features to distinguish normal from disordered voices.

Results: On a large database of subjects with a wide variety of voice disorders, these new techniques can distinguish normal from disordered cases, using quadratic discriminant analysis, to overall correct classification performance of 91.8% plus or minus 2.0%. The true positive classification performance is 95.4% plus or minus 3.2%, and the true negative performance is 91.5% plus or minus 2.3% (95% confidence). This is shown to outperform all combinations of the most popular classical tools.

Conclusions: Given the very large number of arbitrary parameters and computational complexity of existing techniques, these new techniques are far simpler and yet achieve clinically useful classification performance using only a basic classification technique. They do so by exploiting the inherent nonlinearity and turbulent randomness in disordered voice signals. They are widely applicable to the whole range of disordered voice phenomena by design. These new measures could therefore be used for a variety of practical clinical purposes.
&#xa

    Data-driven discovery of coordinates and governing equations

    Full text link
    The discovery of governing equations from scientific data has the potential to transform data-rich fields that lack well-characterized quantitative descriptions. Advances in sparse regression are currently enabling the tractable identification of both the structure and parameters of a nonlinear dynamical system from data. The resulting models have the fewest terms necessary to describe the dynamics, balancing model complexity with descriptive ability, and thus promoting interpretability and generalizability. This provides an algorithmic approach to Occam's razor for model discovery. However, this approach fundamentally relies on an effective coordinate system in which the dynamics have a simple representation. In this work, we design a custom autoencoder to discover a coordinate transformation into a reduced space where the dynamics may be sparsely represented. Thus, we simultaneously learn the governing equations and the associated coordinate system. We demonstrate this approach on several example high-dimensional dynamical systems with low-dimensional behavior. The resulting modeling framework combines the strengths of deep neural networks for flexible representation and sparse identification of nonlinear dynamics (SINDy) for parsimonious models. It is the first method of its kind to place the discovery of coordinates and models on an equal footing.Comment: 25 pages, 6 figures; added acknowledgment

    Optoelectronic Reservoir Computing

    Get PDF
    Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an opto-electronic implementation of reservoir computing based on a recently proposed architecture consisting of a single non linear node and a delay line. Our implementation is sufficiently fast for real time information processing. We illustrate its performance on tasks of practical importance such as nonlinear channel equalization and speech recognition, and obtain results comparable to state of the art digital implementations.Comment: Contains main paper and two Supplementary Material

    Statistical Models of Reconstructed Phase Spaces for Signal Classification

    Get PDF
    This paper introduces a novel approach to the analysis and classification of time series signals using statistical models of reconstructed phase spaces. With sufficient dimension, such reconstructed phase spaces are, with probability one, guaranteed to be topologically equivalent to the state dynamics of the generating system, and, therefore, may contain information that is absent in analysis and classification methods rooted in linear assumptions. Parametric and nonparametric distributions are introduced as statistical representations over the multidimensional reconstructed phase space, with classification accomplished through methods such as Bayes maximum likelihood and artificial neural networks (ANNs). The technique is demonstrated on heart arrhythmia classification and speech recognition. This new approach is shown to be a viable and effective alternative to traditional signal classification approaches, particularly for signals with strong nonlinear characteristics

    Persistent Homology of Attractors For Action Recognition

    Full text link
    In this paper, we propose a novel framework for dynamical analysis of human actions from 3D motion capture data using topological data analysis. We model human actions using the topological features of the attractor of the dynamical system. We reconstruct the phase-space of time series corresponding to actions using time-delay embedding, and compute the persistent homology of the phase-space reconstruction. In order to better represent the topological properties of the phase-space, we incorporate the temporal adjacency information when computing the homology groups. The persistence of these homology groups encoded using persistence diagrams are used as features for the actions. Our experiments with action recognition using these features demonstrate that the proposed approach outperforms other baseline methods.Comment: 5 pages, Under review in International Conference on Image Processin
    corecore