Search CORE

11,017 research outputs found

Audio Processing

Author: Janse C.P.
Roovers D. A. C. M
Srinivasan S.
Publication venue
Publication date: 30/12/2009
Field of study

An audio processing arrangement (200) comprises a plurality of audio sources (101, 102) generating input audio signals, a processing circuit (110) for deriving processed audio signals from the input audio signals, a combining circuit (120) for deriving a combined audio signalfrom the processed audio signals, and a control circuit (130) for controlling the processing circuit in order to maximize a power measure of the combined audio signal and for limiting a function of gains of the processed audio signals to a predetermined value. In accordance with the present invention, the audio processing arrangement (200) comprises a pre-processing circuit (140) for deriving pre-processed audio signals from the input audio signals to minimize a cross-correlation of interferences comprised in the input audio signals. The pre-processed signals are provided to the processing circuit (110) instead of the input audio signals

Pure OAI Repository

Secure audio processing

Author: Hughes Thad
Kracun Aleksandar
Lopez Moreno Ignacio
Moreno Pedro
Publication venue: Technical Disclosure Commons
Publication date: 10/04/2018
Field of study

Automatic speech recognizers (ASR) are now nearly ubiquitous, finding application in smart assistants, smartphones, smart speakers, and other devices. An attack on an ASR that triggers such a device into carrying out false instructions can lead to severe consequences. Typically, speech recognition is performed using machine learning models, e.g., neural networks, whose intermediate outputs are not always fully concealed. Exposing such intermediate outputs makes the crafting of malicious input audio easier. This disclosure describes techniques that thwart attacks on speech recognition systems by moving model inference processing to a secure computing enclave. The memory of the secure enclave and signals are inaccessible to the user and untrusted processes, and therefore, resistant to attacks

Technical Disclosure Common

Utilizing Domain Knowledge in End-to-End Audio Processing

Author: Antich Jose Luis Diez
Maaløe Lars
Purwins Hendrik
Tax Tycho Max Sylvester
Publication venue
Publication date: 01/12/2017
Field of study

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.Comment: Accepted at the ML4Audio workshop at the NIPS 201

arXiv.org e-Print Archive

VBN

Gabor frames and deep scattering networks in audio processing

Author: Bammer Roswitha
Dörfler Monika
Harar Pavol
Publication venue: 'MDPI AG'
Publication date: 26/09/2019
Field of study

This paper introduces Gabor scattering, a feature extractor based on Gabor frames and Mallat's scattering transform. By using a simple signal model for audio signals specific properties of Gabor scattering are studied. It is shown that for each layer, specific invariances to certain signal characteristics occur. Furthermore, deformation stability of the coefficient vector generated by the feature extractor is derived by using a decoupling technique which exploits the contractivity of general scattering networks. Deformations are introduced as changes in spectral shape and frequency modulation. The theoretical results are illustrated by numerical examples and experiments. Numerical evidence is given by evaluation on a synthetic and a "real" data set, that the invariances encoded by the Gabor scattering transform lead to higher performance in comparison with just using Gabor transform, especially when few training samples are available.Comment: 26 pages, 8 figures, 4 tables. Repository for reproducibility: https://gitlab.com/hararticles/gs-gt . Keywords: machine learning; scattering transform; Gabor transform; deep learning; time-frequency analysis; CNN. Accepted and published after peer revisio

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Digital library of Brno University of Technology

Audio processing for automatic TV sports program highlights detection

Author: Marlow Seán
Murphy Noel
O'Connor Noel E.
Sadlier David A.
Publication venue
Publication date: 01/06/2002
Field of study

In today’s fast paced world, the time available to watch long sports programmes is decreasing, while the number of sports channels is rapidly increasing. Many viewers desire the facility to watch just the highlights of sports events. This paper presents a simple, but effective, method for generating sports video highlights summaries. Our method detects semantically important events in sports programmes by using the Scale Factors in the MPEG audio bitstream to generate an audio amplitude profile of the program. The Scale Factors for the subbands corresponding to the voice bandwidth give a strong indication of the level of commentator and/or spectator excitement. When periods of sustained high audio amplitude have been detected and ranked, the corresponding video shots may be concatenated to produce a summary of the program highlights. Our method uses only the Scale Factor information that is directly accessible from the MPEG bitstream, without any decoding, leading to highly efficient computation. It is also rather more generic than many existing techniques, being particularly suitable for the more popular sports televised in Ireland such as soccer, Gaelic football, hurling, rugby, horse racing and motor racing

Irish Universities

DCU Online Research Access Service

Recommended from our members

Meeting Recorder: Audio Processing

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

An overview of some of the audio processing and tools that have been developed at the Laboratory for Recognition and Organization of Speech and Audio, Department of Electrical Engineering, Columbia University

Columbia University Academic Commons