Search CORE

75 research outputs found

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN

Penerapan Metode Mel Frequency Cepstral Coefficients pada Sistem Pengenalan Suara Berbasis Desktop

Author: Ajinurseto Galih
Bakrim La Ode
Islamuddin Nur
Publication venue: Fakultas Teknik, Universitas Pasundan
Publication date: 29/06/2023
Field of study

Teknologi biometrik sedang menjadi tren teknologi dalam berbagai bidang kehidupan. Teknologi biometrik memanfaatkan bagian tubuh manusia sebagai alat ukur sistem yang memiliki keunikan disetiap individu. Suara merupakan bagian tubuh manusia yang memiliki keunikan dan cocok dijadikan sebagai alat ukur dalam sistem yang mengadopsi teknologi biometrik. Sistem pengenalan suara adalah salah satu penerapan teknologi biometrik yang fokus kepada suara manusia. Sistem pengenalan suara memerlukan metode ekstraksi fitur, salah satu metode ekstraksi fitur adalah metode Mel Frequency Cepstral Coefficients. Metode Mel Frequency Cepstral Coefficients merupakan metode ekstraksi fitur suara yang mengadopsi prinsip indra pendengeran manusia dengan tujuan mendapatkan hasil yang semirip mungkin sebagaimana indra pendengaran manusia. Metode ini dimulai dari tahap pre-emphasis, frame blocking, windowing, fast fourier transform, mel frequency wrapping dan cepstrum. Berdasarkan hasil pengujian, metode Mel Frequency Cepstral Coefficients pada pengujian dengan kondisi ideal, persentase keberhasilan sistem mencapai 90% dan persentase kegagalan sistem sebesar 10% dengan top 5 error rate sebesar 0%, sedangkan pada pengujian dengan kondisi tidak ideal, persentase keberhasilan sistem sebesar 76.6667% dan persentase kegagalan sistem sebesar 23.333% dengan top 5 error rate sebesar 0%.Teknologi biometrik sedang menjadi tren teknologi dalam berbagai bidang kehidupan. Teknologi biometrik memanfaatkan bagian tubuh manusia sebagai alat ukur sistem yang memiliki keunikan disetiap individu. Suara merupakan bagian tubuh manusia yang memiliki keunikan dan cocok dijadikan sebagai alat ukur dalam sistem yang mengadopsi teknologi biometrik. Sistem pengenalan suara adalah salah satu penerapan teknologi biometrik yang fokus kepada suara manusia. Sistem pengenalan suara memerlukan metode ekstraksi fitur, salah satu metode ekstraksi fitur adalah metode Mel Frequency Cepstral Coefficients. Metode Mel Frequency Cepstral Coefficients merupakan metode ekstraksi fitur suara yang mengadopsi prinsip indra pendengeran manusia dengan tujuan mendapatkan hasil yang semirip mungkin sebagaimana indra pendengaran manusia. Metode ini dimulai dari tahap pre-emphasis, frame blocking, windowing, fast fourier transform, mel frequency wrapping dan cepstrum. Berdasarkan hasil pengujian, metode Mel Frequency Cepstral Coefficients pada pengujian dengan kondisi ideal, persentase keberhasilan sistem mencapai 90% dan persentase kegagalan sistem sebesar 10% dengan top 5 error rate sebesar 0%, sedangkan pada pengujian dengan kondisi tidak ideal, persentase keberhasilan sistem sebesar 76.6667% dan persentase kegagalan sistem sebesar 23.333% dengan top 5 error rate sebesar 0%

Pasundan University Journal

Penerapan Metode Mel Frequency Cepstral Coefficients pada Sistem Pengenalan Suara Berbasis Desktop

Author: Ajinurseto Galih
Bakrim La Ode
Islamuddin Nur
Publication venue: Fakultas Teknik, Universitas Pasundan
Publication date: 29/06/2023
Field of study

Journal Universitas Pasundan

Deep dense and convolutional autoencoders for machine acoustic anomaly detection

Author: Coelho Gabriel
Cortez Paulo
Ferreira André
Matos Luis
Nunes Eduardo C.
Pereira Pedro
Pilastri André
Ribeiro Alexandrine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2021
Field of study

Recently, there have been advances in using unsupervised learning methods for Acoustic Anomaly Detection (AAD). In this paper, we propose an improved version of two deep AutoEncoders (AE) for unsupervised AAD for six types of working machines, namely Dense and Convolutional AEs. A large set of computational experiments was held, showing that the two proposed deep autoencoders, when combined with a mel-spectrogram sound preprocessing, are quite competitive and outperform a recently proposed AE baseline. Overall, a high-quality class discrimination level was achieved, ranging from 72% to 92%.European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) - Project n ∘ 039334; Funding Reference: POCI-01-0247-FEDER-039334

Universidade do Minho: RepositoriUM

Semi-supervised source localization with deep generative modeling

Author: Bianco Michael J.
Gannot Sharon
Gerstoft Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/02/2021
Field of study

We propose a semi-supervised localization approach based on deep generative modeling with variational autoencoders (VAEs). Localization in reverberant environments remains a challenge, which machine learning (ML) has shown promise in addressing. Even with large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. We address this issue by performing semi-supervised learning (SSL) with convolutional VAEs. The VAE is trained to generate the phase of relative transfer functions (RTFs), in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The VAE-SSL approach is compared with SRP-PHAT and fully-supervised CNNs. We find that VAE-SSL can outperform both SRP-PHAT and CNN in label-limited scenarios.Comment: Published in proceedings of IEEE International Workshop on Machine Learning for Signal Processing. arXiv admin note: substantial text overlap with arXiv:2101.1063

arXiv.org e-Print Archive

Crossref

On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals

Author: Lancho Alejandro
Lee Gary C. F.
Polyanskiy Yury
Weiss Amir
Wornell Gregory W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/03/2023
Field of study

We study the single-channel source separation problem involving orthogonal frequency-division multiplexing (OFDM) signals, which are ubiquitous in many modern-day digital communication systems. Related efforts have been pursued in monaural source separation, where state-of-the-art neural architectures have been adopted to train an end-to-end separator for audio signals (as 1-dimensional time series). In this work, through a prototype problem based on the OFDM source model, we assess -- and question -- the efficacy of using audio-oriented neural architectures in separating signals based on features pertinent to communication waveforms. Perhaps surprisingly, we demonstrate that in some configurations, where perfect separation is theoretically attainable, these audio-oriented neural architectures perform poorly in separating co-channel OFDM waveforms. Yet, we propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures, that can confer about 30 dB improvement in performance

arXiv.org e-Print Archive