Search CORE

406 research outputs found

Learning Hierarchical Speech Representations Using Deep Convolutional Neural Networks

Author: Hau Darren
Publication venue
Publication date: 01/08/2014
Field of study

The University of Manchester - Institutional Repository

Towards Cognizant Hearing Aids: Modeling of Content, Affect and Attention

Author: Karadogan Seliz
Publication venue: Technical University of Denmark
Publication date: 01/01/2012
Field of study

Online Research Database In Technology

Diffusion-Based Audio Inpainting

Author: Moliner Eloi
Välimäki Vesa
Publication venue
Publication date: 24/05/2023
Field of study

Audio inpainting aims to reconstruct missing segments in corrupted recordings. Previous methods produce plausible reconstructions when the gap length is shorter than about 100\;ms, but the quality decreases for longer gaps. This paper explores recent advancements in deep learning and, particularly, diffusion models, for the task of audio inpainting. The proposed method uses an unconditionally trained generative model, which can be conditioned in a zero-shot fashion for audio inpainting, offering high flexibility to regenerate gaps of arbitrary length. An improved deep neural network architecture based on the constant-Q transform, which allows the model to exploit pitch-equivariant symmetries in audio, is also presented. The performance of the proposed algorithm is evaluated through objective and subjective metrics for the task of reconstructing short to mid-sized gaps. The results of a formal listening test show that the proposed method delivers a comparable performance against state-of-the-art for short gaps, while retaining a good audio quality and outperforming the baselines for the longest gap lengths tested, 150\;ms and 200\;ms. This work helps improve the restoration of sound recordings having fairly long local disturbances or dropouts, which must be reconstructed.Comment: Submitted for publication to the Journal of Audio Engineering Society on January 30th, 202

arXiv.org e-Print Archive

Learning Sensory Representations with Minimal Supervision

Author: Saeed Aaqib
Publication venue: Technische Universiteit Eindhoven
Publication date: 24/06/2021
Field of study

Pure OAI Repository

Recommended from our members

Time-Frequency Analysis as Probabilistic Inference

Author: Sahani Maneesh
Turner Richard E
Publication venue: IEEE Transactions on Signal Processing
Publication date: 01/12/2014
Field of study

This is the final published version. It was originally published by IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6918491.This paper proposes a new view of time-frequency analysis framed in terms of probabilistic inference. Natural signals are assumed to be formed by the superposition of distinct time-frequency components, with the analytic goal being to infer these components by application of Bayes' rule. The framework serves to unify various existing models for natural time-series; it relates to both the Wiener and Kalman filters, and with suitable assumptions yields inferential interpretations of the short-time Fourier transform, spectrogram, filter bank, and wavelet representations. Value is gained by placing time-frequency analysis on the same probabilistic basis as is often employed in applications such as denoising, source separation, or recognition. Uncertainty in the time-frequency representation can be propagated correctly to application-specific stages, improving the handing of noise and missing data. Probabilistic learning allows modules to be co-adapted; thus, the time-frequency representation can be adapted to both the demands of the application and the time-varying statistics of the signal at hand. Similarly, the application module can be adapted to fine properties of the signal propagated by the initial time-frequency processing. We demonstrate these benefits by combining probabilistic time-frequency representations with non-negative matrix factorization, finding benefits in audio denoising and inpainting tasks, albeit with higher computational cost than incurred by the standard approach.Funding was provided by EPSRC (grant numbers EP/G050821/1 and EP/L000776/1) and Google (R.E.T.) and by the Gatsby Charitable Foundation (M.S.)

Apollo (Cambridge)

Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization

Author: A Sehr
B Kingsbury
CH Knapp
G Hinton
H Xu
JF Gemmeke
KJ Palomäki
S Thomas
SM Pizer
T Virtanen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

This paper describes a novel two-stage dereverberation feature enhancement method for noise-robust automatic speech recognition. In the first stage, an estimate of the dereverberated speech is generated by matching the distribution of the observed reverberant speech to that of clean speech, in a decorrelated transformation domain that has a long temporal context in order to address the effects of reverberation. The second stage uses this dereverberated signal as an initial estimate within a non-negative matrix factorization framework, which jointly estimates a sparse representation of the clean speech signal and an estimate of the convolutional distortion. The proposed feature enhancement method, when used in conjunction with automatic speech recognizer back-end processing, is shown to improve the recognition performance compared to three other state-of-the-art techniques

Crossref

Springer - Publisher Connector

Aaltodoc Publication Archive

White Rose Research Online

Survey of deep representation learning for speech emotion recognition

Author: Jurdak Raja
Khalifa Sara
Latif Siddique
Qadir Junaid
Rana Rajib
Schuller Björn W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual eort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated \textit{deep representation learning} where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER

OPUS Augsburg

Queensland University of Technology ePrints Archive

University of Southern Queensland ePrints

Designing the next generation intelligent transportation sensor system using big data driven machine learning techniques

Author: Huang Tongge
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2020
Field of study

Accurate traffic data collection is essential for supporting advanced traffic management system operations. This study investigated a large-scale data-driven sequential traffic sensor health monitoring (TSHM) module that can be used to monitor sensor health conditions over large traffic networks. Our proposed module consists of three sequential steps for detecting different types of abnormal sensor issues. The first step detects sensors with abnormally high missing data rates, while the second step uses clustering anomaly detection to detect sensors reporting abnormal records. The final step introduces a novel Bayesian changepoint modeling technique to detect sensors reporting abnormal traffic data fluctuations by assuming a constant vehicle length distribution based on average effective vehicle length (AEVL). Our proposed method is then compared with two benchmark algorithms to show its efficacy. Results obtained by applying our method to the statewide traffic sensor data of Iowa show it can successfully detect different classes of sensor issues. This demonstrates that sequential TSHM modules can help transportation agencies determine traffic sensors’ exact problems, thereby enabling them to take the required corrective steps. The second research objective will focus on the traffic data imputation after we discard the anomaly/missing data collected from failure traffic sensors. Sufficient high-quality traffic data are a crucial component of various Intelligent Transportation System (ITS) applications and research related to congestion prediction, speed prediction, incident detection, and other traffic operation tasks. Nonetheless, missing traffic data are a common issue in sensor data which is inevitable due to several reasons, such as malfunctioning, poor maintenance or calibration, and intermittent communications. Such missing data issues often make data analysis and decision-making complicated and challenging. In this study, we have developed a generative adversarial network (GAN) based traffic sensor data imputation framework (TSDIGAN) to efficiently reconstruct the missing data by generating realistic synthetic data. In recent years, GANs have shown impressive success in image data generation. However, generating traffic data by taking advantage of GAN based modeling is a challenging task, since traffic data have strong time dependency. To address this problem, we propose a novel time-dependent encoding method called the Gramian Angular Summation Field (GASF) that converts the problem of traffic time-series data generation into that of image generation. We have evaluated and tested our proposed model using the benchmark dataset provided by Caltrans Performance Management Systems (PeMS). This study shows that the proposed model can significantly improve the traffic data imputation accuracy in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to state-of-the-art models on the benchmark dataset. Further, the model achieves reasonably high accuracy in imputation tasks even under a very high missing data rate (\u3e50%), which shows the robustness and efficiency of the proposed model. Besides the loop and radar sensors, traffic cameras have shown great ability to provide insightful traffic information using the image and video processing techniques. Therefore, the third and final part of this work aimed to introduce an end to end real-time cloud-enabled traffic video analysis (IVA) framework to support the development of the future smart city. As Artificial intelligence (AI) growing rapidly, Computer vision (CV) techniques are expected to significantly improve the development of intelligent transportation systems (ITS), which are anticipated to be a key component of future Smart City (SC) frameworks. Powered by computer vision techniques, the converting of existing traffic cameras into connected ``smart sensors called intelligent video analysis (IVA) systems has shown the great capability of producing insightful data to support ITS applications. However, developing such IVA systems for large-scale, real-time application deserves further study, as the current research efforts are focused more on model effectiveness instead of model efficiency. Therefore, we have introduced a real-time, large-scale, cloud-enabled traffic video analysis framework using NVIDIA DeepStream, which is a streaming analysis toolkit for AI-based video and image analysis. In this study, we have evaluated the technical and economic feasibility of our proposed framework to help traffic agency to build IVA systems more efficiently. Our study shows that the daily operating cost for our proposed framework on Google Cloud Platform (GCP) is less than $0.14 per camera, and that, compared with manual inspections, our framework achieves an average vehicle-counting accuracy of 83.7% on sunny days

Digital Repository @ Iowa State University (ISU)