1,864 research outputs found

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Modeling of complex nonlinear dynamic systems using temporal convolution neural networks

    Get PDF
    An increasingly important class of nonlinear systems includes the nonaffine hybrid systems, in particular those in which the underlying dynamics explicitly depends on a switching signal. When the inherent complexity is treatable and the phenomena governing the system dynamics are known an implicit model can be derived to describe its behaviour over time. Conversely, when these assumptions are not met the system dynamics can still be approximated by regression-based techniques, provided a dataset comprising inputs and outputs collected from the system is available. One approach to deal with data driven modelling relies on computational intelligent frameworks, in which artificial neural networks stand out as a prominent class of universal approximation black box models. This work aims to explore 1D Convolutional Neural Networks capabilities, in which the inputs are represented by regressors and structural configuration parameters, to modelling nonlinear hybrid dynamic systems. Moreover, in order evaluate the intrinsic ability to transparently approximate hybrid dynamics, this deep neural network architecture is compared to a shallow multilayer layer perceptron framework, in which each structural configuration is independently approximated.Uma classe de sistemas não lineares que tem vindo a ganhar cada vez mais importância é a dos sistemas híbridos não-afins, em particular aqueles em que a dinâmica subjacente depende explicitamente de um sinal de comutação. Quando a complexidade inerente é tratável e os fenómenos que controlam a dinâmica do sistema são conhecidos, é possível obter-se um modelo implícito para descrever seu comportamento ao longo do tempo. Por outro lado, quando essas suposições não são cumpridas, a dinâmica do sistema pode ainda ser aproximada por técnicas baseadas em regressão, desde que um conjunto de dados contendo as entradas e as saídas do sistema esteja disponível. Uma abordagem para lidar com o problema de modelação experimental recorrendo a técnicas de inteligência computacional, na quais as redes neuronais artificiais se destacam como uma das classes proeminentes de aproximadores universais. Este trabalho tem como objetivo explorar as capacidades de redes neuronais convolutivas 1D, onde as entradas são representadas por regressores e parâmetros de configuração estrutural. Além disso, para avaliar a capacidade intrínseca para a aproximação de dinâmicas híbridas, esta arquitetura de rede neuronal profunda é comparada a uma estrutura neuronal proactivas multicamada, na qual cada configuração estrutural é independentemente aproximada

    A Review of Structural Health Monitoring Techniques as Applied to Composite Structures.

    Get PDF
    Structural Health Monitoring (SHM) is the process of collecting, interpreting, and analysing data from structures in order to determine its health status and the remaining life span. Composite materials have been extensively use in recent years in several industries with the aim at reducing the total weight of structures while improving their mechanical properties. However, composite materials are prone to develop damage when subjected to low to medium impacts (ie 1 – 10 m/s and 11 – 30 m/s respectively). Hence, the need to use SHM techniques to detect damage at the incipient initiation in composite materials is of high importance. Despite the availability of several SHM methods for the damage identification in composite structures, no single technique has proven suitable for all circumstances. Therefore, this paper offers some updated guidelines for the users of composites on some of the recent advances in SHM applied to composite structures; also, most of the studies reported in the literature seem to have concentrated on the flat composite plates and reinforced with synthetic fibre. There are relatively fewer stories on other structural configurations such as single or double curve structures and hybridised composites reinforced with natural and synthetic fibres as regards SHM

    Joint Phoneme Segmentation Inference and Classification using CRFs

    Get PDF
    State-of-the-art phoneme sequence recognition systems are based on hybrid hidden Markov model/artificial neural networks (HMM/ANN) framework. In this framework, the local classifier, ANN, is typically trained using Viterbi expectation-maximization algorithm, which involves two separate steps: phoneme sequence segmentation and training of ANN. In this paper, we propose a CRF based phoneme sequence recognition approach that simultaneously infers the phoneme segmentation and classifies the phoneme sequence. More specifically, the phoneme sequence recognition system consists of a local classifier ANN followed by a conditional random field (CRF) whose parameters are trained jointly, using a cost function that discriminates the true phoneme sequence against all competing sequences. In order to efficiently train such a system we introduce a novel CRF based segmentation using acyclic graph. We study the viability of the proposed approach on TIMIT phoneme recognition task. Our studies show that the proposed approach is capable of achieving performance similar to standard hybrid HMM/ANN and ANN/CRF systems where the ANN is trained with manual segmentation
    • …
    corecore