151 research outputs found

    Object-based Modeling of Audio for Coding and Source Separation

    Get PDF
    This thesis studies several data decomposition algorithms for obtaining an object-based representation of an audio signal. The estimation of the representation parameters are coupled with audio-specific criteria, such as the spectral redundancy, sparsity, perceptual relevance and spatial position of sounds. The objective is to obtain an audio signal representation that is composed of meaningful entities called audio objects that reflect the properties of real-world sound objects and events. The estimation of the object-based model is based on magnitude spectrogram redundancy using non-negative matrix factorization with extensions to multichannel and complex-valued data. The benefits of working with object-based audio representations over the conventional time-frequency bin-wise processing are studied. The two main applications of the object-based audio representations proposed in this thesis are spatial audio coding and sound source separation from multichannel microphone array recordings. In the proposed spatial audio coding algorithm, the audio objects are estimated from the multichannel magnitude spectrogram. The audio objects are used for recovering the content of each original channel from a single downmixed signal, using time-frequency filtering. The perceptual relevance of modeling the audio signal is considered in the estimation of the parameters of the object-based model, and the sparsity of the model is utilized in encoding its parameters. Additionally, a quantization of the model parameters is proposed that reflects the perceptual relevance of each quantized element. The proposed object-based spatial audio coding algorithm is evaluated via listening tests and comparing the overall perceptual quality to conventional time-frequency block-wise methods at the same bitrates. The proposed approach is found to produce comparable coding efficiency while providing additional functionality via the object-based coding domain representation, such as the blind separation of the mixture of sound sources in the encoded channels. For the sound source separation from multichannel audio recorded by a microphone array, a method combining an object-based magnitude model and spatial covariance matrix estimation is considered. A direction of arrival-based model for the spatial covariance matrices of the sound sources is proposed. Unlike the conventional approaches, the estimation of the parameters of the proposed spatial covariance matrix model ensures a spatially coherent solution for the spatial parameterization of the sound sources. The separation quality is measured with objective criteria and the proposed method is shown to improve over the state-of-the-art sound source separation methods, with recordings done using a small microphone array

    Blind Source Separation for the Processing of Contact-Less Biosignals

    Get PDF
    (Spatio-temporale) Blind Source Separation (BSS) eignet sich für die Verarbeitung von Multikanal-Messungen im Bereich der kontaktlosen Biosignalerfassung. Ziel der BSS ist dabei die Trennung von (z.B. kardialen) Nutzsignalen und Störsignalen typisch für die kontaktlosen Messtechniken. Das Potential der BSS kann praktisch nur ausgeschöpft werden, wenn (1) ein geeignetes BSS-Modell verwendet wird, welches der Komplexität der Multikanal-Messung gerecht wird und (2) die unbestimmte Permutation unter den BSS-Ausgangssignalen gelöst wird, d.h. das Nutzsignal praktisch automatisiert identifiziert werden kann. Die vorliegende Arbeit entwirft ein Framework, mit dessen Hilfe die Effizienz von BSS-Algorithmen im Kontext des kamera-basierten Photoplethysmogramms bewertet werden kann. Empfehlungen zur Auswahl bestimmter Algorithmen im Zusammenhang mit spezifischen Signal-Charakteristiken werden abgeleitet. Außerdem werden im Rahmen der Arbeit Konzepte für die automatisierte Kanalauswahl nach BSS im Bereich der kontaktlosen Messung des Elektrokardiogramms entwickelt und bewertet. Neuartige Algorithmen basierend auf Sparse Coding erwiesen sich dabei als besonders effizient im Vergleich zu Standard-Methoden.(Spatio-temporal) Blind Source Separation (BSS) provides a large potential to process distorted multichannel biosignal measurements in the context of novel contact-less recording techniques for separating distortions from the cardiac signal of interest. This potential can only be practically utilized (1) if a BSS model is applied that matches the complexity of the measurement, i.e. the signal mixture and (2) if permutation indeterminacy is solved among the BSS output components, i.e the component of interest can be practically selected. The present work, first, designs a framework to assess the efficacy of BSS algorithms in the context of the camera-based photoplethysmogram (cbPPG) and characterizes multiple BSS algorithms, accordingly. Algorithm selection recommendations for certain mixture characteristics are derived. Second, the present work develops and evaluates concepts to solve permutation indeterminacy for BSS outputs of contact-less electrocardiogram (ECG) recordings. The novel approach based on sparse coding is shown to outperform the existing concepts of higher order moments and frequency-domain features

    Enhanced independent vector analysis for audio separation in a room environment

    Get PDF
    Independent vector analysis (IVA) is studied as a frequency domain blind source separation method, which can theoretically avoid the permutation problem by retaining the dependency between different frequency bins of the same source vector while removing the dependency between different source vectors. This thesis focuses upon improving the performance of independent vector analysis when it is used to solve the audio separation problem in a room environment. A specific stability problem of IVA, i.e. the block permutation problem, is identified and analyzed. Then a robust IVA method is proposed to solve this problem by exploiting the phase continuity of the unmixing matrix. Moreover, an auxiliary function based IVA algorithm with an overlapped chain type source prior is proposed as well to mitigate this problem. Then an informed IVA scheme is proposed which combines the geometric information of the sources from video to solve the problem by providing an intelligent initialization for optimal convergence. The proposed informed IVA algorithm can also achieve a faster convergence in terms of iteration numbers and better separation performance. A pitch based evaluation method is defined to judge the separation performance objectively when the information describing the mixing matrix and sources is missing. In order to improve the separation performance of IVA, an appropriate multivariate source prior is needed to better preserve the dependency structure within the source vectors. A particular multivariate generalized Gaussian distribution is adopted as the source prior. The nonlinear score function derived from this proposed source prior contains the fourth order relationships between different frequency bins, which provides a more informative and stronger dependency structure compared with the original IVA algorithm and thereby improves the separation performance. Copula theory is a central tool to model the nonlinear dependency structure. The t copula is proposed to describe the dependency structure within the frequency domain speech signals due to its tail dependency property, which means if one variable has an extreme value, other variables are expected to have extreme values. A multivariate student's t distribution constructed by using a t copula with the univariate student's t marginal distribution is proposed as the source prior. Then the IVA algorithm with the proposed source prior is derived. The proposed algorithms are tested with real speech signals in different reverberant room environments both using modelled room impulse response and real room recordings. State-of-the-art criteria are used to evaluate the separation performance, and the experimental results confirm the advantage of the proposed algorithms

    Functional Magnetic Resonance Imaging of Human Brain during Rest and Viewing Movies

    Get PDF
    Neurotieteellisissä kokeissa on perinteisesti käytetty tarkasti kontrolloituja koeasetelmia ja yksinkertaisia ärsykkeitä aivojen toimintaa tutkittaessa. Viime aikoina tutkimusta on pyritty laajentamaan luonnollisempiin asetelmiin. Aivojen toimintaa on mitattu esimerkiksi koehenkilöiden katsellessa elokuvaa tai ?lepotilassa? levollisen valveillaolon aikana ilman mitään erityistä tehtävää. Tässä diplomityössä tutkitaan ihmisen aivotoimintaa luonnollisen kaltaisissa tilanteissa riippumattomien komponenttien analyysin (ICA) avulla. Lepotilassa löydettyjä verkostoja verrataan kolmessa tilanteessa; levossa ennen elokuvan (Tulitikkutehtaan tyttö, Aki Kaurismäki, 1990) katsomista, elokuvan aikana ja levossa elokuvan katsomisen jälkeen. ICA:lla löydettyjen lähde-estimaattien vakautta tutkittiin bootstrap-laskennalla. Elokuvasta annotoituja ärsykepiirteitä verrattiin niiden aivoverkostojen aikakäytökseen, joiden aikakäyttäytyminen oli samankaltaista eri koehenkilöillä. Ärsykepiirteiden avulla vertailtiin lisäksi ICA:n erottamien verkostojen laajuutta yksittäisten ärsykepiirteiden kanssa korreloituviin aivoalueisiin. ICA onnistui erottamaan merkityksellisiä toiminnallisia verkostoja aivoissa. Verkostojen laajuudessa tapahtui vain vähän muutoksia eri koetilanteiden välillä. Luonnollinen katselutilanne kuitenkin mahdollisti komponenttien jakamisen pienempiin toiminnallisiin yksiköihin kuin lepotilassa sekä data-lähtöisin, että mallipohjaisin analyysimenetelmin. Eri koehenkilöillä samankaltaisesti käyttäytyneet riippumattomat komponentit paikantuivat lähinnä aistispesifeille ja assosiaatioalueille aivojen temporaali-, oksipitaali- ja parietaalilohkoilla. Osalla komponenteista aikakäytöksen havaittiin seuraavan elokuvasta annotoituja piirteitä.Neuroscientific research of human brain function has traditionally relied on highly controlled experiments with relatively simple stimuli. Recently effort has been directed toward expanding the research into a more naturalistic context. Brain function has been measured for example during viewing movies and in a "resting state" in absence of as specific task. In this thesis, independent component analysis (ICA) is used to research human brain function in naturalistic settings. The brain networks observed at rest are compared in three conditions; resting before watching a movie (The Match Factory Girl, Aki Kaurismäki, 1990), during the movie, and resting after the movie. The stability of the source estimates obtained using ICA was evaluated using bootstrapping. The temporal structure of the independent components (ICs) was compared to stimulus features annotated from the movie. Similarity of the networks' activation time courses across subjects was used to select components that were compared with specific stimulus features. These features were also correlated directly to the preprocessed data to validate the results of ICA. ICA was successful at separating meaningful functional networks within the brain. The extent of the networks changed very little between the different conditions. However, the natural viewing condition allowed the ICs to be separated into smaller functional units than was achievable during rest using both data-driven and model based methods. The independent components exhibiting significant temporal similarity between subjects were highly concentrated in the sensory and associative areas of the temporal, occipital and parietal lobes. The activity of some ICs was found to follow distinct features of the movie

    Speech-brain synchronization: a possible cause for developmental dyslexia

    Get PDF
    152 p.Dyslexia is a neurological learning disability characterized by the difficulty in an individual¿s ability to read despite adequate intelligence and normal opportunities. The majority of dyslexic readers present phonological difficulties. The phonological difficulty most often associated with dyslexia is a deficit in phonological awareness, that is, the ability to hear and manipulate the sound structure of language. Some appealing theories of dyslexia attribute a causal role to auditory atypical oscillatory neural activity, suggesting it generates some of the phonological problems in dyslexia. These theories propose that auditory cortical oscillations of dyslexic individuals entrain less accurately to the spectral properties of auditory stimuli at distinct frequency bands (delta, theta and gamma) that are important for speech processing. Nevertheless, there are diverging hypotheses concerning the specific bands that would be disrupted in dyslexia, and which are the consequences of such difficulties on speech processing. The goal of the present PhD thesis was to portray the neural oscillatory basis underlying phonological difficulties in developmental dyslexia. We evaluated whether phonological deficits in developmental dyslexia are associated with impaired auditory entrainment to a specific frequency band. In that aim, we measured auditory neural synchronization to linguistic and non-linguistic auditory signals at different frequencies corresponding to key phonological units of speech (prosodic, syllabic and phonemic information). We found that dyslexic readers presented atypical neural entrainment to delta, theta and gamma frequency bands. Importantly, we showed that atypical entrainment to theta and gamma modulations in dyslexia could compromise perceptual computations during speech processing, while reduced delta entrainment in dyslexia could affect perceptual and attentional operations during speech processing. In addition, we characterized the links between the anatomy of the auditory cortex and its oscillatory responses, taking into account previous studies which have observed structural alterations in dyslexia. We observed that the cortical pruning in auditory regions was linked to a stronger sensitivity to gamma oscillation in skilled readers, but to stronger theta band sensitivity in dyslexic readers. Thus, we concluded that the left auditory regions might be specialized for processing phonological information at different time scales (phoneme vs. syllable) in skilled and dyslexic readers. Lastly, by assessing both children and adults on similar tasks, we provided the first evaluation of developmental modulations of typical and atypical auditory sampling (and their structural underpinnings). We found that atypical neural entrainment to delta, theta and gamma are present in dyslexia throughout the lifespan and is not modulated by reading experience

    THE ROLE OF GAMMA OSCILLATIONS AND CORTICAL INHIBITION IN THE DEVELOPMENT OF WORKING MEMORY IN ADOLESCENCE

    Get PDF
    Adolescence is a dynamic period of social, cognitive, and biological changes. In particular, working memory, the ability to actively encode and maintain information over a short period of time, develops early in childhood and gradually increases in capacity and stability during adolescence. The precise neurophysiological mechanism by which working memory capacity increases during adolescence is unclear. The objective of this investigation was to evaluate the role of cortical gamma-band (\u3e 30 Hz) oscillations—which are associated with working memory in adults—for the development of working memory capacity in adolescents, and to identify the extent to which the temporal profile of gamma-aminobutyric acid (GABA)-mediated cortical inhibition underlies these changes. I hypothesized that cortical gamma-band rhythms would become faster during adolescence in a manner that supports improved working memory capacity, and that the kinetics of cortical inhibition would also become faster to support these faster rhythms. To this end, I recruited two cohorts of typically developing children (10 – 12 years) and adolescents (15 – 17 years) for a combined electrophysiology (EEG) and transcranial magnetic stimulation (TMS) study. First, I investigated the endogenous rhythmic activity generated by children and adolescence when performing a serially presented working memory task of varying set size. I found evidence of maturation in the generation of gamma-band rhythms which differed in power between groups, but identified no effects of a change in the central frequency of gamma-band activity. Next, I used TMS to exogenously evoke oscillatory activity in the left prefrontal cortex to identify the cortical natural (i.e., resonant) frequency. Using this measure, I found that adolescents exhibit higher median natural frequencies (MdCHILD = 16 Hz; MdADO = 24 Hz, Z = 2.35, p = 0.009), but that sex may play a mediating role when this change emerges. While this measure positively correlated with working memory capacity (rs = 0.47, p = 0.007), this effect disappeared when controlling for age and sex (rs = 0.29, p = 0.128). Finally, I investigated the role of inhibitory timing as a potential mechanism for improved cognition and increased natural frequency using classic paired pulse TMS techniques. Six inter-pulse intervals (IPI) in the range of short- and long-intracortical inhibition (SICI, LICI) were tested to assess the temporal characteristics of GABA type-A and type-B receptor-mediated inhibition (GABAAR, GABABR, respectively). For SICI, I found alpha-band (9-14 Hz) facilitation in children and suppression in adolescents. For LICI, adolescents demonstrated greater suppression of gamma-band power compared to children, and equal suppression to children in the beta-band (15-30 Hz). I found no evidence for a change in timing of SICI- or LICI-induced modulations though LICI suppression of gamma- and beta-band power correlated with working memory capacity. The overall hypothesis that the prefrontal cortex can produce faster rhythms during adolescent development was supported, but the hypothesized relationships between those rhythms, working memory capacity, and the timing of GABA-mediated inhibition were not. Rather, I observed several developmental differences in oscillatory power that suggest excitation-inhibition balance underlies the developmental increases in working memory capacity and gamma-band synchrony

    Speech-brain synchronization: a possible cause for developmental dyslexia

    Get PDF
    152 p.Dyslexia is a neurological learning disability characterized by the difficulty in an individual¿s ability to read despite adequate intelligence and normal opportunities. The majority of dyslexic readers present phonological difficulties. The phonological difficulty most often associated with dyslexia is a deficit in phonological awareness, that is, the ability to hear and manipulate the sound structure of language. Some appealing theories of dyslexia attribute a causal role to auditory atypical oscillatory neural activity, suggesting it generates some of the phonological problems in dyslexia. These theories propose that auditory cortical oscillations of dyslexic individuals entrain less accurately to the spectral properties of auditory stimuli at distinct frequency bands (delta, theta and gamma) that are important for speech processing. Nevertheless, there are diverging hypotheses concerning the specific bands that would be disrupted in dyslexia, and which are the consequences of such difficulties on speech processing. The goal of the present PhD thesis was to portray the neural oscillatory basis underlying phonological difficulties in developmental dyslexia. We evaluated whether phonological deficits in developmental dyslexia are associated with impaired auditory entrainment to a specific frequency band. In that aim, we measured auditory neural synchronization to linguistic and non-linguistic auditory signals at different frequencies corresponding to key phonological units of speech (prosodic, syllabic and phonemic information). We found that dyslexic readers presented atypical neural entrainment to delta, theta and gamma frequency bands. Importantly, we showed that atypical entrainment to theta and gamma modulations in dyslexia could compromise perceptual computations during speech processing, while reduced delta entrainment in dyslexia could affect perceptual and attentional operations during speech processing. In addition, we characterized the links between the anatomy of the auditory cortex and its oscillatory responses, taking into account previous studies which have observed structural alterations in dyslexia. We observed that the cortical pruning in auditory regions was linked to a stronger sensitivity to gamma oscillation in skilled readers, but to stronger theta band sensitivity in dyslexic readers. Thus, we concluded that the left auditory regions might be specialized for processing phonological information at different time scales (phoneme vs. syllable) in skilled and dyslexic readers. Lastly, by assessing both children and adults on similar tasks, we provided the first evaluation of developmental modulations of typical and atypical auditory sampling (and their structural underpinnings). We found that atypical neural entrainment to delta, theta and gamma are present in dyslexia throughout the lifespan and is not modulated by reading experience

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Recent Applications in Graph Theory

    Get PDF
    Graph theory, being a rigorously investigated field of combinatorial mathematics, is adopted by a wide variety of disciplines addressing a plethora of real-world applications. Advances in graph algorithms and software implementations have made graph theory accessible to a larger community of interest. Ever-increasing interest in machine learning and model deployments for network data demands a coherent selection of topics rewarding a fresh, up-to-date summary of the theory and fruitful applications to probe further. This volume is a small yet unique contribution to graph theory applications and modeling with graphs. The subjects discussed include information hiding using graphs, dynamic graph-based systems to model and control cyber-physical systems, graph reconstruction, average distance neighborhood graphs, and pure and mixed-integer linear programming formulations to cluster networks

    Characterization and processing of atrial fibrillation episodes by convolutive blind source separation algorithms and nonlinear analysis of spectral features

    Full text link
    Las arritmias supraventriculares, en particular la fibrilación auricular (FA), son las enfermedades cardíacas más comúnmente encontradas en la práctica clínica rutinaria. La prevalencia de la FA es inferior al 1\% en la población menor de 60 años, pero aumenta de manera significativa a partir de los 70 años, acercándose al 10\% en los mayores de 80. El padecimiento de un episodio de FA sostenida, además de estar ligado a una mayor tasa de mortalidad, aumenta la probabilidad de sufrir tromboembolismo, infarto de miocardio y accidentes cerebrovasculares. Por otro lado, los episodios de FA paroxística, aquella que termina de manera espontánea, son los precursores de la FA sostenida, lo que suscita un alto interés entre la comunidad científica por conocer los mecanismos responsables de perpetuar o conducir a la terminación espontánea de los episodios de FA. El análisis del ECG de superficie es la técnica no invasiva más extendida en la diagnosis médica de las patologías cardíacas. Para utilizar el ECG como herramienta de estudio de la FA, se necesita separar la actividad auricular (AA) de las demás señales cardioeléctricas. En este sentido, las técnicas de Separación Ciega de Fuentes (BSS) son capaces de realizar un análisis estadístico multiderivación con el objetivo de recuperar un conjunto de fuentes cardioeléctricas independientes, entre las cuales se encuentra la AA. A la hora de abordar un problema de BSS, se hace necesario considerar un modelo de mezcla de las fuentes lo más ajustado posible a la realidad para poder desarrollar algoritmos matemáticos que lo resuelvan. Un modelo viable es aquel que supone mezclas lineales. Dentro del modelo de mezclas lineales se puede además hacer la restricción de que estas sean instantáneas. Este modelo de mezcla lineal instantánea es el utilizado en el Análisis de Componentes Independientes (ICA).Vayá Salort, C. (2010). Characterization and processing of atrial fibrillation episodes by convolutive blind source separation algorithms and nonlinear analysis of spectral features [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8416Palanci
    corecore