Search CORE

6,086 research outputs found

Frame Theory for Signal Processing in Psychoacoustics

Author: A. Bregman
A. Janssen
A. Ron
A.V. Oppenheim
A.V. Oppenheim
B. Laback
B. Laback
B.C.J. Moore
B.C.J. Moore
B.R. Glasberg
C. Heil
C. Heil
C. Wiesmeyr
C.J. Plack
D. Soderquist
D. Wang
D.D. Greenwood
D.T. Stoeva
D.T. Stoeva
E. Hernández
E. Ravelli
E. Zwicker
E. Zwicker
E.A. Lopez-Poveda
G. Chardon
G. Kidd Jr
G. Matz
H. Bölcskei
H. Fastl
I. Daubechies
J. Kovačević
J. Leng
J.J. Benedetto
J.J. O’Donovan
J.S. Garofolo
K. Gröchenig
L. Chai
L.N. Trefethen
M. Bownik
M. Bézat
M. Elad
M. Unoki
M. Vetterli
N. Holighaus
N. Holighaus
N. Perraudin
N.K. Bari
O. Christensen
O. Christensen
O. Christensen
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Casazza
P. Søndergaard
P. Vaidyanathan
P.G. Casazza
P.G. Casazza
R.D. Patterson
R.J. Duffin
R.M. Young
S. Strahl
T. Irino
T. Painter
T. Werther
T.S. Gunawan
W. Jesteadt
X. Valero
X. Zhao
Z. Cvetković
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/11/2016
Field of study

This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field

arXiv.org e-Print Archive

Crossref

Robust equalization of multichannel acoustic systems

Author: Zhang Wancheng
Zhang Wancheng
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/08/2010
Field of study

In most real-world acoustical scenarios, speech signals captured by distant microphones from a source are reverberated due to multipath propagation, and the reverberation may impair speech intelligibility. Speech dereverberation can be achieved by equalizing the channels from the source to microphones. Equalization systems can be computed using estimates of multichannel acoustic impulse responses. However, the estimates obtained from system identification always include errors; the fact that an equalization system is able to equalize the estimated multichannel acoustic system does not mean that it is able to equalize the true system. The objective of this thesis is to propose and investigate robust equalization methods for multichannel acoustic systems in the presence of system identification errors. Equalization systems can be computed using the multiple-input/output inverse theorem or multichannel least-squares method. However, equalization systems obtained from these methods are very sensitive to system identification errors. A study of the multichannel least-squares method with respect to two classes of characteristic channel zeros is conducted. Accordingly, a relaxed multichannel least- squares method is proposed. Channel shortening in connection with the multiple- input/output inverse theorem and the relaxed multichannel least-squares method is discussed. Two algorithms taking into account the system identification errors are developed. Firstly, an optimally-stopped weighted conjugate gradient algorithm is proposed. A conjugate gradient iterative method is employed to compute the equalization system. The iteration process is stopped optimally with respect to system identification errors. Secondly, a system-identification-error-robust equalization method exploring the use of error models is presented, which incorporates system identification error models in the weighted multichannel least-squares formulation

Spiral - Imperial College Digital Repository

Adaptive transfer functions: improved multiresolution visualization of medical models

Author: Brunet Crosa Pere
Díaz García Jesús
Navazo Álvaro Isabel
Pérez Frederic
Vázquez Alcocer Pere Pau
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s00371-016-1253-9Medical datasets are continuously increasing in size. Although larger models may be available for certain research purposes, in the common clinical practice the models are usually of up to 512x512x2000 voxels. These resolutions exceed the capabilities of conventional GPUs, the ones usually found in the medical doctors’ desktop PCs. Commercial solutions typically reduce the data by downsampling the dataset iteratively until it fits the available target specifications. The data loss reduces the visualization quality and this is not commonly compensated with other actions that might alleviate its effects. In this paper, we propose adaptive transfer functions, an algorithm that improves the transfer function in downsampled multiresolution models so that the quality of renderings is highly improved. The technique is simple and lightweight, and it is suitable, not only to visualize huge models that would not fit in a GPU, but also to render not-so-large models in mobile GPUs, which are less capable than their desktop counterparts. Moreover, it can also be used to accelerate rendering frame rates using lower levels of the multiresolution hierarchy while still maintaining high-quality results in a focus and context approach. We also show an evaluation of these results based on perceptual metrics.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

Author: Chinen Troy
Covell Michele
Hwang Sung Jin
Johnston Nick
Minnen David
Shor Joel
Singh Saurabh
Toderici George
Vincent Damien
Publication venue
Publication date: 29/03/2017
Field of study

We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several metrics. Second, we modify the recurrent architecture to improve spatial diffusion, which allows the network to more effectively capture and propagate image information through the network's hidden state. Finally, in addition to lossless entropy coding, we use a spatially adaptive bit allocation algorithm to more efficiently use the limited number of bits to encode visually complex image regions. We evaluate our method on the Kodak and Tecnick image sets and compare against standard codecs as well recently published methods based on deep neural networks

arXiv.org e-Print Archive

Crossref

Analysis of Speaker Clustering Strategies for HMM-Based Speech Synthesis

Author: Dall Rasmus
King Simon
Veaux Christophe
Yamagishi Junichi
Publication venue
Publication date: 01/01/2012
Field of study

This paper describes a method for speaker clustering, with the application of building average voice models for speakeradaptive HMM-based speech synthesis that are a good basis for adapting to specific target speakers. Our main hypothesis is that using perceptually similar speakers to build the average voice model will be better than use unselected speakers, even if the amount of data available from perceptually similar speakers is smaller. We measure the perceived similarities among a group of 30 female speakers in a listening test and then apply multiple linear regression to automatically predict these listener judgements of speaker similarity and thus to identify similar speakers automatically. We then compare a variety of average voice models trained on either speakers who were perceptually judged to be similar to the target speaker, or speakers selected by the multiple linear regression, or a large global set of unselected speakers. We find that the average voice model trained on perceptually similar speakers provides better performance than the global model, even though the latter is trained on more data, confirming our main hypothesis. However, the average voice model using speakers selected automatically by the multiple linear regression does not reach the same level of performance. Index Terms: Statistical parametric speech synthesis, hidden Markov models, speaker adaptatio

CiteSeerX

Edinburgh Research Explorer

A Similarity Measure for Material Appearance

Author: Garces Elena
Gutierrez Diego
Lagunas Manuel
Malpica Sandra
Masia Belen
Serrano Ana
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

We present a model to measure the similarity in appearance between different materials, which correlates with human similarity judgments. We first create a database of 9,000 rendered images depicting objects with varying materials, shape and illumination. We then gather data on perceived similarity from crowdsourced experiments; our analysis of over 114,840 answers suggests that indeed a shared perception of appearance similarity exists. We feed this data to a deep learning architecture with a novel loss function, which learns a feature space for materials that correlates with such perceived appearance similarity. Our evaluation shows that our model outperforms existing metrics. Last, we demonstrate several applications enabled by our metric, including appearance-based search for material suggestions, database visualization, clustering and summarization, and gamut mapping.Comment: 12 pages, 17 figure

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Universidad de Zaragoza

Universidad Zaragoza: Open Journal Systems