141 research outputs found

    Adaptive Beamforming for Medical Ultrasound Imaging

    Get PDF

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    Scalable video compression with optimized visual performance and random accessibility

    Full text link
    This thesis is concerned with maximizing the coding efficiency, random accessibility and visual performance of scalable compressed video. The unifying theme behind this work is the use of finely embedded localized coding structures, which govern the extent to which these goals may be jointly achieved. The first part focuses on scalable volumetric image compression. We investigate 3D transform and coding techniques which exploit inter-slice statistical redundancies without compromising slice accessibility. Our study shows that the motion-compensated temporal discrete wavelet transform (MC-TDWT) practically achieves an upper bound to the compression efficiency of slice transforms. From a video coding perspective, we find that most of the coding gain is attributed to offsetting the learning penalty in adaptive arithmetic coding through 3D code-block extension, rather than inter-frame context modelling. The second aspect of this thesis examines random accessibility. Accessibility refers to the ease with which a region of interest is accessed (subband samples needed for reconstruction are retrieved) from a compressed video bitstream, subject to spatiotemporal code-block constraints. We investigate the fundamental implications of motion compensation for random access efficiency and the compression performance of scalable interactive video. We demonstrate that inclusion of motion compensation operators within the lifting steps of a temporal subband transform incurs a random access penalty which depends on the characteristics of the motion field. The final aspect of this thesis aims to minimize the perceptual impact of visible distortion in scalable reconstructed video. We present a visual optimization strategy based on distortion scaling which raises the distortion-length slope of perceptually significant samples. This alters the codestream embedding order during post-compression rate-distortion optimization, thus allowing visually sensitive sites to be encoded with higher fidelity at a given bit-rate. For visual sensitivity analysis, we propose a contrast perception model that incorporates an adaptive masking slope. This versatile feature provides a context which models perceptual significance. It enables scene structures that otherwise suffer significant degradation to be preserved at lower bit-rates. The novelty in our approach derives from a set of "perceptual mappings" which account for quantization noise shaping effects induced by motion-compensated temporal synthesis. The proposed technique reduces wavelet compression artefacts and improves the perceptual quality of video

    Novel Approaches to ECG-Based Modeling and Characterization of Atrial Fibrillation

    Get PDF
    This thesis deals with signal processing algorithms for analysis of the electrocardiogram (ECG) during atrial fibrillation (AF). Such analysis can be used for diagnosing patients, and for monitoring and predicting their response to various treatment. The thesis comprises an introduction and five papers describing methods for ECG-based modeling and characterization of AF. Paper I--IV deal with methods for characterization of the atrial activity, whereas Paper V deals with modeling of the ventricular response, both problems with the assumption that AF is present. In Paper I, a number of measures characterizing the atrial activity in the ECG, obtained using time-frequency analysis as well as nonlinear methods, are evaluated for their ability to predict spontaneous termination of AF. The AF frequency, i.e, the repetition rate of the atrial fibrillatory waves of the ECG, proved to be a significant factor for discrimination between terminating and non-terminating AF. Noise is a common problem in ECG signals, particularly in long-term ambulatory recordings. Hence, robust algorithms for analysis and characterization are required. In Paper II, a robust method for tracking the AF frequency in noisy signals is presented. The method is based on a hidden Markov model (HMM), which takes the harmonic pattern of the atrial activity into account. Using the HMM-based method, the average RMS error of the frequency estimates at high noise levels was significantly lower compared to existing methods. In Paper III, the HMM-based method is employed for analysis of 24-h ambulatory ECG signals in order to explore circadian variation in AF frequency. Circadian variations reflect autonomic modulation; attenuation or absence of such variations may help to diagnose patients. Methods based on curve fitting, autocorrelation, and joint variation, respectively, are employed to quantify circadian variations, showing that it is present in most patients with long-standing persistent AF, although the short-term variation is considerable. In Paper IV, 24-h ambulatory ECG recordings with paroxysmal and persistent AF are analyzed using an entropy-based method for characterization of the atrial activity. Short segments are classified based on these measures, showing that it is feasible to distinguish between patient with paroxysmal and persistent AF from 10-s ECGs; the average classification rate was above 95%. The ventricular response during AF is mainly determined by the AV nodal blocking of atrial impulses. In Paper V, a new model-based approach for analysis of the ventricular response during AF is proposed. The model integrates physiological properties of the AV node and the atrial fibrillatory rate; the model parameters can be estimated from ECG signals. Results show that ventricular response is sufficiently represented by the estimated model in a majority of the recordings; in 85.7% of the analyzed 30-min segments the model fit was considered accurate, and that changes of AV nodal properties caused by autonomic modulation could be tracked through the estimated model parameters. In summary, the work within this thesis contributes with new methods for non-invasive analysis of AF, which can be used to tailor and evaluate different strategies for AF treatment

    A human visual system based image coder

    Get PDF
    Over the years, society has changed considerably due to technological changes, and digital images have become part and parcel of our everyday lives. Irrespective of applications (i.e., digital camera) and services (information sharing, e.g., Youtube, archive / storage), there is the need for high image quality with high compression ratios. Hence, considerable efforts have been invested in the area of image compression. The traditional image compression systems take into account of statistical redundancies inherent in the image data. However, the development and adaptation of vision models, which take into account the properties of the human visual system (HVS), into picture coders have since shown promising results. The objective of the thesis is to propose the implementation of a vision model in two different manners in the JPEG2000 coding system: (a) a Perceptual Colour Distortion Measure (PCDM) for colour images in the encoding stage, and (b) a Perceptual Post Filtering (PPF) algorithm for colour images in the decoding stage. Both implementations are embedded into the JPEG2000 coder. The vision model here exploits the contrast sensitivity, the inter-orientation masking and intra-band masking visual properties of the HVS. Extensive calibration work has been undertaken to fine-tune the 42 model parameters of the PCDM and Just-Noticeable-Difference thresholds of the PPF for colour images. Evaluation with subjective assessments of PCDM based coder has shown perceived quality improvement over the JPEG2000 benchmark with the MSE (mean square error) and CVIS criteria. For the PPF adapted JPEG2000 decoder, performance evaluation has also shown promising results against the JPEG2000 benchmarks. Based on subjective evaluation, when both PCDM and PPF are used in the JPEG2000 coding system, the overall perceived image quality is superior to the stand-alone JPEG2000 with the PCDM

    The plenacoustic function and its applications

    Get PDF
    This thesis is a study of the spatial evolution of the sound field. We first present an analysis of the sound field along different geometries. In the case of the sound field studied along a line in a room, we describe a two-dimensional function characterizing the sound field along space and time. Calculating the Fourier transform of this function leads to a spectrum having a butterfly shape. The spectrum is shown to be almost bandlimited along the spatial frequency dimension, which allows the interpolation of the sound field at any position along the line when a sufficient number of microphones is present. Using this Fourier representation of the sound field, we develop a spatial sampling theorem trading off quality of reconstruction with spatial sampling frequency. The study is generalized for planes of microphones and microphones located in three dimensions. The presented theory is compared to simulations and real measurements of room impulse responses. We describe a similar theory for circular arrays of microphones or loudspeakers. Application of this theory is presented for the study of the angular sampling of head-related transfer functions (HRTFs). As a result, we show that to reconstruct HRTFs at any possible angle in the horizontal plane, an angular spacing of 5 degrees is necessary for HRTFs sampled at 44.1 kHz. Because recording that many HRTFs is not easy, we develop interpolation techniques to achieve acceptable results for databases containing two or four times fewer HRTFs. The technique is based on the decomposition of the HRTFs in their carrier and complex envelopes. With the Fourier representation of the sound field, it is then shown how one can correctly obtain all room impulse responses measured along a trajectory when using a moving loudspeaker or microphone. The presented method permits the reconstruction of the room impulse responses at any position along the trajectory, provided that the speed satisfies a given relation. The maximal speed is shown to be dependent on the maximal frequency emitted and the radius of the circle. This method takes into account the Doppler effect present when one element is moving in the scenario. It is then shown that the measurement of HRTFs in the horizontal plane can be achieved in less than one second. In the last part, we model spatio-temporal channel impulse responses between a fixed source and a moving receiver. The trajectory followed by the moving element is modeled as a continuous autoregressive process. The presented model is simple and versatile. It allows the generation of random trajectories with a controlled smoothness. Application of this study can be found in the modeling of acoustic channels for acoustic echo cancellation or of time-varying multipath electromagnetic channels used in mobile wireless communications

    Acquisition and modeling of material appearance

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 131-143).In computer graphics, the realistic rendering of synthetic scenes requires a precise description of surface geometry, lighting, and material appearance. While 3D geometry scanning and modeling have advanced significantly in recent years, measurement and modeling of accurate material appearance have remained critical challenges. Analytical models are the main tools to describe material appearance in most current applications. They provide compact and smooth approximations to real materials but lack the expressiveness to represent complex materials. Data-driven approaches based on exhaustive measurements are fully general but the measurement process is difficult and the storage requirement is very high. In this thesis, we propose the use of hybrid representations that are more compact and easier to acquire than exhaustive measurement, while preserving much generality of a data-driven approach. To represent complex bidirectional reflectance distribution functions (BRDFs), we present a new method to estimate a general microfacet distribution from measured data. We show that this representation is able to reproduce complex materials that are impossible to model with purely analytical models.(cont.) We also propose a new method that significantly reduces measurement cost and time of the bidirectional texture function (BTF) through a statistical characterization of texture appearance. Our reconstruction method combines naturally aligned images and alignment-insensitive statistics to produce visually plausible results. We demonstrate our acquisition system which is able to capture intricate materials like fabrics in less than ten minutes with commodity equipments. In addition, we present a method to facilitate effective user design in the space of material appearance. We introduce a metric in the space of reflectance which corresponds roughly to perceptual measures. The main idea of our approach is to evaluate reflectance differences in terms of their induced rendered images, instead of the reflectance function itself defined in the angular domains. With rendered images, we show that even a simple computational metric can provide good perceptual spacing and enable intuitive navigation of the reflectance space.by Wai Kit Addy Ngan.Ph.D

    Analysis and detection of human emotion and stress from speech signals

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Optimization and improvements in spatial sound reproduction systems through perceptual considerations

    Full text link
    [ES] La reproducción de las propiedades espaciales del sonido es una cuestión cada vez más importante en muchas aplicaciones inmersivas emergentes. Ya sea en la reproducción de contenido audiovisual en entornos domésticos o en cines, en sistemas de videoconferencia inmersiva o en sistemas de realidad virtual o aumentada, el sonido espacial es crucial para una sensación de inmersión realista. La audición, más allá de la física del sonido, es un fenómeno perceptual influenciado por procesos cognitivos. El objetivo de esta tesis es contribuir con nuevos métodos y conocimiento a la optimización y simplificación de los sistemas de sonido espacial, desde un enfoque perceptual de la experiencia auditiva. Este trabajo trata en una primera parte algunos aspectos particulares relacionados con la reproducción espacial binaural del sonido, como son la escucha con auriculares y la personalización de la Función de Transferencia Relacionada con la Cabeza (Head Related Transfer Function - HRTF). Se ha realizado un estudio sobre la influencia de los auriculares en la percepción de la impresión espacial y la calidad, con especial atención a los efectos de la ecualización y la consiguiente distorsión no lineal. Con respecto a la individualización de la HRTF se presenta una implementación completa de un sistema de medida de HRTF y se introduce un nuevo método para la medida de HRTF en salas no anecoicas. Además, se han realizado dos experimentos diferentes y complementarios que han dado como resultado dos herramientas que pueden ser utilizadas en procesos de individualización de la HRTF, un modelo paramétrico del módulo de la HRTF y un ajuste por escalado de la Diferencia de Tiempo Interaural (Interaural Time Difference - ITD). En una segunda parte sobre reproducción con altavoces, se han evaluado distintas técnicas como la Síntesis de Campo de Ondas (Wave-Field Synthesis - WFS) o la panoramización por amplitud. Con experimentos perceptuales se han estudiado la capacidad de estos sistemas para producir sensación de distancia y la agudeza espacial con la que podemos percibir las fuentes sonoras si se dividen espectralmente y se reproducen en diferentes posiciones. Las aportaciones de esta investigación pretenden hacer más accesibles estas tecnologías al público en general, dada la demanda de experiencias y dispositivos audiovisuales que proporcionen mayor inmersión.[CA] La reproducció de les propietats espacials del so és una qüestió cada vegada més important en moltes aplicacions immersives emergents. Ja siga en la reproducció de contingut audiovisual en entorns domèstics o en cines, en sistemes de videoconferència immersius o en sistemes de realitat virtual o augmentada, el so espacial és crucial per a una sensació d'immersió realista. L'audició, més enllà de la física del so, és un fenomen perceptual influenciat per processos cognitius. L'objectiu d'aquesta tesi és contribuir a l'optimització i simplificació dels sistemes de so espacial amb nous mètodes i coneixement, des d'un criteri perceptual de l'experiència auditiva. Aquest treball tracta, en una primera part, alguns aspectes particulars relacionats amb la reproducció espacial binaural del so, com són l'audició amb auriculars i la personalització de la Funció de Transferència Relacionada amb el Cap (Head Related Transfer Function - HRTF). S'ha realitzat un estudi relacionat amb la influència dels auriculars en la percepció de la impressió espacial i la qualitat, dedicant especial atenció als efectes de l'equalització i la consegüent distorsió no lineal. Respecte a la individualització de la HRTF, es presenta una implementació completa d'un sistema de mesura de HRTF i s'inclou un nou mètode per a la mesura de HRTF en sales no anecoiques. A mès, s'han realitzat dos experiments diferents i complementaris que han donat com a resultat dues eines que poden ser utilitzades en processos d'individualització de la HRTF, un model paramètric del mòdul de la HRTF i un ajustament per escala de la Diferencià del Temps Interaural (Interaural Time Difference - ITD). En una segona part relacionada amb la reproducció amb altaveus, s'han avaluat distintes tècniques com la Síntesi de Camp d'Ones (Wave-Field Synthesis - WFS) o la panoramització per amplitud. Amb experiments perceptuals, s'ha estudiat la capacitat d'aquests sistemes per a produir una sensació de distància i l'agudesa espacial amb que podem percebre les fonts sonores, si es divideixen espectralment i es reprodueixen en diferents posicions. Les aportacions d'aquesta investigació volen fer més accessibles aquestes tecnologies al públic en general, degut a la demanda d'experiències i dispositius audiovisuals que proporcionen major immersió.[EN] The reproduction of the spatial properties of sound is an increasingly important concern in many emerging immersive applications. Whether it is the reproduction of audiovisual content in home environments or in cinemas, immersive video conferencing systems or virtual or augmented reality systems, spatial sound is crucial for a realistic sense of immersion. Hearing, beyond the physics of sound, is a perceptual phenomenon influenced by cognitive processes. The objective of this thesis is to contribute with new methods and knowledge to the optimization and simplification of spatial sound systems, from a perceptual approach to the hearing experience. This dissertation deals in a first part with some particular aspects related to the binaural spatial reproduction of sound, such as listening with headphones and the customization of the Head Related Transfer Function (HRTF). A study has been carried out on the influence of headphones on the perception of spatial impression and quality, with particular attention to the effects of equalization and subsequent non-linear distortion. With regard to the individualization of the HRTF a complete implementation of a HRTF measurement system is presented, and a new method for the measurement of HRTF in non-anechoic conditions is introduced. In addition, two different and complementary experiments have been carried out resulting in two tools that can be used in HRTF individualization processes, a parametric model of the HRTF magnitude and an Interaural Time Difference (ITD) scaling adjustment. In a second part concerning loudspeaker reproduction, different techniques such as Wave-Field Synthesis (WFS) or amplitude panning have been evaluated. With perceptual experiments it has been studied the capacity of these systems to produce a sensation of distance, and the spatial acuity with which we can perceive the sound sources if they are spectrally split and reproduced in different positions. The contributions of this research are intended to make these technologies more accessible to the general public, given the demand for audiovisual experiences and devices with increasing immersion.Gutiérrez Parera, P. (2020). Optimization and improvements in spatial sound reproduction systems through perceptual considerations [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/142696TESI
    corecore