468 research outputs found

    Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

    Get PDF
    We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

    UBathy: a new approach for bathymetric inversion from video imagery

    Get PDF
    A new approach to infer the bathymetry from coastal video monitoring systems is presented. The methodology uses principal component analysis of the Hilbert transform of video images to obtain the components of the wave propagation field and their corresponding frequency and wavenumber. Incident and reflected constituents and subharmonics components are also found. Local water depth is then successfully estimated through wave dispersion relationship. The method is first applied to monochromatic and polychromatic synthetic wave trains propagated using linear wave theory over an alongshore uniform bathymetry in order to analyze the influence of different parameters on the results. To assess the ability of the approach to infer the bathymetry under more realistic conditions and to explore the influence of other parameters, nonlinear wave propagation is also performed using a fully nonlinear Boussinesq-type model over a complex bathymetry. In the synthetic cases, the relative root mean square error obtained in bathymetry recovery (for water depths 0.75m¿h¿8.0m) ranges from ~1% to ~3% for infinitesimal-amplitude wave cases (monochromatic or polychromatic) to ~15% in the most complex case (nonlinear polychromatic waves). Finally, the new methodology is satisfactorily validated through a real field site video.Postprint (published version

    Graph neural networks for electroencephalogram analysis

    Get PDF
    El objetivo de este trabajo es proporcionar un modelo capaz de identificar la enfermedad de Alzheimer y el Deterioro Cognitivo Leve (DCL) en registros de electroencefalogramas (EEG). A pesar de que los EEGs son una de las pruebas más pruebas utilizadas para los trastornos neurológicos, hoy en día el diagnóstico de estas enfermedades se basa en el comportamiento del paciente. comportamiento del paciente. Esto se debe a que la precisión de los expertos en el reconocimiento visual de los EEGs se estima en torno al 50%. Para resolver las dificultades de la tarea mencionada, esta tesis propone un modelo de Red Neural Gráfica (GNN) para clasificar a los sujetos utilizando únicamente las señales registradas. Para desarrollar el modelo final, primero propusimos varios procedimientos para construir gráficos a partir de las señales de EEGs, explorando diferentes formas de representar la conectividad entre canales, así como métodos para la extracción de de las características relevantes. Por el momento, no hay modelos GNN propuestos para la detección de Alzheimer o DCL. Por lo tanto, utilizamos arquitecturas empleadas en tareas similares y las modificamos para nuestro dominio específico. Por último, se evalúa un conjunto de combinaciones coherentes de grafos y modelos GNN bajo el mismo conjunto de métricas. Además, para las combinaciones con mejor rendimiento, se realiza un estudio del impacto de varios hiperparámetros se lleva a cabo. Con el fin de manejar todos los experimentos posibles, hemos desarrollado un marco de software para construir fácilmente construir los diferentes tipos de gráficos, crear los modelos y evaluar su rendimiento. La mejor combinación de construcción de grafos y diseño de modelos, basada en capas convolucionales de atención a los grafos, conduce a un 92,31% de precisión en la clasificación binaria de sujetos sanos y enfermos de Alzheimer y a un 87,59% de precisión cuando se evalúan también las grabaciones de pacientes con Deterioro Cognitivo Leve, que son comparables a los resultados del estado del arte. resultados del estado del arte. Aunque este trabajo se realiza en un campo novedoso y existen muchas posibilidades aún posibilidades aún por explorar, concluimos que las GNNs muestran capacidades sobrehumanas para la detección de Alzheimer y DCL utilizando EEGs.The aim of this work is to provide a model able to identify Alzheimer's disease and Mild Cognitive Impairment (MCI) in electroencephalogram's (EEGs) recordings. Despite EEGs being one of the most common tests used for neurological disorders, nowadays the diagnose of these diseases is based on the patient's behaviour. This is because expert's accuracy on EEGs visual recognition is estimated to be around 50%. To solve the difficulties of the aforementioned task, this thesis proposes a Graph Neural Network (GNN) model to classify the subjects using only the recorded signals. To develop the final model, first we proposed several procedures to build graphs from the EEGs signals, exploring different ways of representing the inter-channel connectivity as well as methods for relevant features extraction. For the time being, there are not GNN models proposed for Alzheimer or MCI detection. Hence, we used architectures employed by similar tasks and modified them for our specific domain. Finally, a set of coherent combinations of graph and GNN model is evaluated under the same set of metrics. Moreover, for the best performing combinations, a study of the impact of several hyperparameters is carried out. In order to handle all the possible experiments, we developed a software framework to easily build the different types of graphs, create the models and evaluate their performance. The best combination of graph building and model design, based on graph attention convolutional layers, leads to a 92.31% of accuracy in the binary classification of healthy subjects and Alzheimer's patients and to a 87.59% of accuracy when also evaluating MCI patients recordings, these are comparable to state of the art results. Although this work is done within a novel field and there exist many possibilities yet to be explored, we conclude that GNNs show super-human capabilities for Alzheimer and MCI detection using EEGs

    Exploring remote photoplethysmography signals for deepfake detection in facial videos

    Get PDF
    Abstract. With the advent of deep learning-based facial forgeries, also called "deepfakes", the feld of accurately detecting forged videos has become a quickly growing area of research. For this endeavor, remote photoplethysmography, the process of extracting biological signals such as the blood volume pulse and heart rate from facial videos, offers an interesting avenue for detecting fake videos that appear utterly authentic to the human eye. This thesis presents an end-to-end system for deepfake video classifcation using remote photoplethysmography. The minuscule facial pixel colour changes are used to extract the rPPG signal, from which various features are extracted and used to train an XGBoost classifer. The classifer is then tested using various colour-to-blood volume pulse methods (OMIT, POS, LGI and CHROM) and three feature extraction window lengths of two, four and eight seconds. The classifer was found effective at detecting deepfake videos with an accuracy of 85 %, with minimal performance difference found between the window lengths. The GREEN channel signal was found to be important for this classifcationEtäfotoplethysmografian hyödyntäminen syväväärennösten tunnistamiseen. Tiivistelmä. Syväväärennösten eli syväoppimiseen perustuvien kasvoväärennöksien yleistyessä väärennösten tarkasta tunnistamisesta koneellisesti on tullut nopeasti kasvava tutkimusalue. Etäfotoplethysmografa (rPPG) eli biologisten signaalien kuten veritilavuuspulssin tai sykkeen mittaaminen videokuvasta tarjoaa kiinnostavan keinon tunnistaa väärennöksiä, jotka vaikuttavat täysin aidoilta ihmissilmälle. Tässä diplomityössä esitellään etäfotoplethysmografaan perustuva syväväärennösten tunnistusmetodi. Kasvojen minimaalisia värimuutoksia hyväksikäyttämällä mitataan fotoplethysmografasignaali, josta lasketuilla ominaisuuksilla koulutetaan XGBoost-luokittelija. Luokittelijaa testataan usealla eri värisignaalista veritilavuussignaaliksi muuntavalla metodilla sekä kolmella eri ominaisuuksien ikkunapituudella. Luokittelija pystyy tunnistamaan väärennetyn videon aidosta 85 % tarkkuudella. Eri ikkunapituuksien välillä oli minimaalisia eroja, ja vihreän värin signaalin havaittiin olevan luokittelun suorituskyvyn kannalta merkittävä

    Idealized computational models for auditory receptive fields

    Full text link
    This paper presents a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to enable invariance of receptive field responses under natural sound transformations and ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or the combination of a time-causal generalized Gammatone filter over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table

    Digital acoustics: processing wave fields in space and time using DSP tools

    Get PDF
    Systems with hundreds of microphones for acoustic field acquisition, or hundreds of loudspeakers for rendering, have been proposed and built. To analyze, design, and apply such systems requires a framework that allows us to leverage the vast set of tools available in digital signal processing in order to achieve intuitive and efficient algorithms. We thus propose a discrete space-time framework, grounded in classical acoustics, which addresses the discrete nature of the spatial and temporal sampling. In particular, a short-space/time Fourier transform is introduced, which is the natural extension of the localized or short-time Fourier transform. Processing in this intuitive domain allows us to easily devise algorithms for beam-forming, source separation, and multi-channel compression, among other useful tasks. The essential space band-limitedness of the Fourier spectrum is also used to solve the spatial equalization task required for sound field rendering in a region of interest. Examples of applications are show

    Analysis, modeling and wide-area spatiotemporal control of low-frequency sound reproduction

    Get PDF
    This research aims to develop a low-frequency response control methodology capable of delivering a consistent spectral and temporal response over a wide listening area. Low-frequency room acoustics are naturally plagued by room-modes, a result of standing waves at frequencies with wavelengths that are integer multiples of one or more room dimension. The standing wave pattern is different for each modal frequency, causing a complicated sound field exhibiting a highly position-dependent frequency response. Enhanced systems are investigated with multiple degrees of freedom (independently-controllable sound radiating sources) to provide adequate low-frequency response control. The proposed solution, termed a chameleon subwoofer array or CSA, adopts the most advantageous aspects of existing room-mode correction methodologies while emphasizing efficiency and practicality. Multiple degrees of freedom are ideally achieved by employing what is designated a hybrid subwoofer, which provides four orthogonal degrees of freedom configured within a modest-sized enclosure. The CSA software algorithm integrates both objective and subjective measures to address listener preferences including the possibility of individual real-time control. CSAs and existing techniques are evaluated within a novel acoustical modeling system (FDTD simulation toolbox) developed to meet the requirements of this research. Extensive virtual development of CSAs has led to experimentation using a prototype hybrid subwoofer. The resulting performance is in line with the simulations, whereby variance across a wide listening area is reduced by over 50% with only four degrees of freedom. A supplemental novel correction algorithm addresses correction issues at select narrow frequency bands. These frequencies are filtered from the signal and replaced using virtual bass to maintain all aural information, a psychoacoustical effect giving the impression of low-frequency. Virtual bass is synthesized using an original hybrid approach combining two mainstream synthesis procedures while suppressing each method‟s inherent weaknesses. This algorithm is demonstrated to improve CSA output efficiency while maintaining acceptable subjective performance

    Analysis of motion in scale space

    Get PDF
    This work includes some new aspects of motion estimation by the optic flow method in scale spaces. The usual techniques for motion estimation are limited to the application of coarse to fine strategies. The coarse to fine strategies can be successful only if there is enough information in every scale. In this work we investigate the motion estimation in the scale space more basically. The wavelet choice for scale space decomposition of image sequences is discussed in the first part of this work. We make use of the continuous wavelet transform with rotationally symmetric wavelets. Bandpass decomposed sequences allow the replacement of the structure tensor by the phase invariant energy operator. The structure tensor is computationally more expensive because of its spatial or spatio-temporal averaging. The energy operator needs in general no further averaging. The numerical accuracy of the motion estimation with the energy operator is compared to the results of usual techniques, based on the structure tensor. The comparison tests are performed on synthetic and real life sequences. Another practical contribution is the accuracy measurement for motion estimation by adaptive smoothed tensor fields. The adaptive smoothing relies on nonlinear anisotropic diffusion with discontinuity and curvature preservation. We reached an accuracy gain under properly chosen parameters for the diffusion filter. A theoretical contribution from mathematical point of view is a new discontinuity and curvature preserving regularization for motion estimation. The convergence of solutions for the isotropic case of the nonlocal partial differential equation is shown. For large displacements between two consecutive frames the optic flow method is systematically corrupted because of the violence of the sampling theorem. We developed a new method for motion analysis by scale decomposition, which allows to circumvent the systematic corruption without using the coarse to fine strategy. The underlying assumption is, that in a certain neighborhood the grey value undergoes the same displacement. If this is fulfilled, then the same optic flow should be measured in all scales. If there arise inconsistencies in a pixel across the scale space, so they can be detected and the scales containing this inconsistencies are not taken into account

    Model-based Sparse Component Analysis for Reverberant Speech Localization

    Get PDF
    In this paper, the problem of multiple speaker localization via speech separation based on model-based sparse recovery is studies. We compare and contrast computational sparse optimization methods incorporating harmonicity and block structures as well as autoregressive dependencies underlying spectrographic representation of speech signals. The results demonstrate the effectiveness of block sparse Bayesian learning framework incorporating autoregressive correlations to achieve a highly accurate localization performance. Furthermore, significant improvement is obtained using ad-hoc microphones for data acquisition set-up compared to the compact microphone array

    Impact of Random Deployment on Operation and Data Quality of Sensor Networks

    Get PDF
    Several applications have been proposed for wireless sensor networks, including habitat monitoring, structural health monitoring, pipeline monitoring, and precision agriculture. Among the desirable features of wireless sensor networks, one is the ease of deployment. Since the nodes are capable of self-organization, they can be placed easily in areas that are otherwise inaccessible to or impractical for other types of sensing systems. In fact, some have proposed the deployment of wireless sensor networks by dropping nodes from a plane, delivering them in an artillery shell, or launching them via a catapult from onboard a ship. There are also reports of actual aerial deployments, for example the one carried out using an unmanned aerial vehicle (UAV) at a Marine Corps combat centre in California -- the nodes were able to establish a time-synchronized, multi-hop communication network for tracking vehicles that passed along a dirt road. While this has a practical relevance for some civil applications (such as rescue operations), a more realistic deployment involves the careful planning and placement of sensors. Even then, nodes may not be placed optimally to ensure that the network is fully connected and high-quality data pertaining to the phenomena being monitored can be extracted from the network. This work aims to address the problem of random deployment through two complementary approaches: The first approach aims to address the problem of random deployment from a communication perspective. It begins by establishing a comprehensive mathematical model to quantify the energy cost of various concerns of a fully operational wireless sensor network. Based on the analytic model, an energy-efficient topology control protocol is developed. The protocol sets eligibility metric to establish and maintain a multi-hop communication path and to ensure that all nodes exhaust their energy in a uniform manner. The second approach focuses on addressing the problem of imperfect sensing from a signal processing perspective. It investigates the impact of deployment errors (calibration, placement, and orientation errors) on the quality of the sensed data and attempts to identify robust and error-agnostic features. If random placement is unavoidable and dense deployment cannot be supported, robust and error-agnostic features enable one to recognize interesting events from erroneous or imperfect data
    corecore