Search CORE

246,777 research outputs found

A Review of Audio Features and Statistical Models Exploited for Voice Pattern Design

Author: Duong Hien-Thanh
Duong Ngoc Q. K.
Publication venue
Publication date: 24/02/2015
Field of study

Audio fingerprinting, also named as audio hashing, has been well-known as a powerful technique to perform audio identification and synchronization. It basically involves two major steps: fingerprint (voice pattern) design and matching search. While the first step concerns the derivation of a robust and compact audio signature, the second step usually requires knowledge about database and quick-search algorithms. Though this technique offers a wide range of real-world applications, to the best of the authors' knowledge, a comprehensive survey of existing algorithms appeared more than eight years ago. Thus, in this paper, we present a more up-to-date review and, for emphasizing on the audio signal processing aspect, we focus our state-of-the-art survey on the fingerprint design step for which various audio features and their tractable statistical models are discussed.Comment: http://www.iaria.org/conferences2015/PATTERNS15.html ; Seventh International Conferences on Pervasive Patterns and Applications (PATTERNS 2015), Mar 2015, Nice, Franc

arXiv.org e-Print Archive

CMOS Hyperbolic Sine ELIN filters for low/audio frequency biomedical applications

Author: Kardoulaki Evdokia
Publication venue: Bioengineering, Imperial College London
Publication date: 01/04/2012
Field of study

Hyperbolic-Sine (Sinh) filters form a subclass of Externally-Linear-Internally-Non- Linear (ELIN) systems. They can handle large-signals in a low power environment under half the capacitor area required by the more popular ELIN Log-domain filters. Their inherent class-AB nature stems from the odd property of the sinh function at the heart of their companding operation. Despite this early realisation, the Sinh filtering paradigm has not attracted the interest it deserves to date probably due to its mathematical and circuit-level complexity. This Thesis presents an overview of the CMOS weak inversion Sinh filtering paradigm and explains how biomedical systems of low- to audio-frequency range could benefit from it. Its dual scope is to: consolidate the theory behind the synthesis and design of high order Sinh continuous–time filters and more importantly to confirm their micro-power consumption and 100+ dB of DR through measured results presented for the first time. Novel high order Sinh topologies are designed by means of a systematic mathematical framework introduced. They employ a recently proposed CMOS Sinh integrator comprising only p-type devices in its translinear loops. The performance of the high order topologies is evaluated both solely and in comparison with their Log domain counterparts. A 5th order Sinh Chebyshev low pass filter is compared head-to-head with a corresponding and also novel Log domain class-AB topology, confirming that Sinh filters constitute a solution of equally high DR (100+ dB) with half the capacitor area at the expense of higher complexity and power consumption. The theoretical findings are validated by means of measured results from an 8th order notch filter for 50/60Hz noise fabricated in a 0.35μm CMOS technology. Measured results confirm a DR of 102dB, a moderate SNR of ~60dB and 74μW power consumption from 2V power supply

Spiral - Imperial College Digital Repository

Audio-visual Rhetoric: Visualizing the Pattern Language of Film

Author: Buchmüller Sandra
Englert Roman
Joost Gesche
Publication venue
Publication date: 13/07/2009
Field of study

Audio-visual Rhetoric is a knowledge domain for designers in theory and practice that is valid for all communicative actions through media that aim for persuasion. Within this domain, we introduce a framework for media analysis. We developed an Audio-Visual Pattern (AVP) language for film that is visualized within a notation system. This system shows auditory and visual parameters in order to reveal film’s rhetorical structure. We discuss related theories from pattern language and rhetoric and apply the AVP method to analyze 10 commercials. Keywords: Pattern Language, Film Analysis, Rhetoric, Emotion, Persuasion, Design Research</p

Sheffield Hallam University Research Archive

Audio style transfer

Author: Duong Ngoc
Grinstein Eric
Ozerov Alexey
Pérez Patrick
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/11/2018
Field of study

'Style transfer' among images has recently emerged as a very active research topic, fuelled by the power of convolution neural networks (CNNs), and has become fast a very popular technology in social media. This paper investigates the analogous problem in the audio domain: How to transfer the style of a reference audio signal to a target audio content? We propose a flexible framework for the task, which uses a sound texture model to extract statistics characterizing the reference audio style, followed by an optimization-based audio texture synthesis to modify the target content. In contrast to mainstream optimization-based visual transfer method, the proposed process is initialized by the target content instead of random noise and the optimized loss is only about texture, not structure. These differences proved key for audio style transfer in our experiments. In order to extract features of interest, we investigate different architectures, whether pre-trained on other tasks, as done in image style transfer, or engineered based on the human auditory system. Experimental results on different types of audio signal confirm the potential of the proposed approach.Comment: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, France. IEE

arXiv.org e-Print Archive

Crossref