826 research outputs found

    Design of hardware architectures for HMM–based signal processing systems with applications to advanced human-machine interfaces

    Get PDF
    In questa tesi viene proposto un nuovo approccio per lo sviluppo di interfacce uomo–macchina. In particolare si tratta il caso di sistemi di pattern recognition che fanno uso di Hidden Markov Models per la classificazione. Il progetto di ricerca è partito dall’ideazione di nuove tecniche per la realizzazione di sistemi di riconoscimento vocale per parlato spontaneo. Gli HMM sono stati scelti come lo strumento algoritmico di base per la realizzazione del sistema. Dopo una fase di studio preliminare gli obiettivi sono stati estesi alla realizzazione di una architettura hardware in grado di fornire uno strumento riconfigurabile che possa essere utilizzato non solo per il riconoscimento vocale, ma in qualsiasi tipo di classificatore basato su HMM. Il lavoro si concentra quindi sullo sviluppo di architetture hardware dedicate, ma nuovi risultati sono stati ottenuti anche a livello di applicazione per quanto riguarda la classificazione di segnali elettroencefalografici attraverso gli HMM. Innanzitutto state sviluppata una architettura a livello di sistema applicabile a qualsiasi sistema di pattern recognition che faccia usi di HMM. L’architettura stata concepita in modo tale da essere utilizzabile come un sistema stand–alone. Definita l’architettura, un processore hardware per HMM, completamente riconfigurabile, stato decritto in linguaggio VHDL e simulato con successo. Un array parallelo di questi processori costituisce di fatto il nucleo di processamento dell’architettura sviluppata. Sulla base del progetto in VHDL, due piattaforme di prototipaggio rapido basate su FPGA sono state selezionate per dei test di implementazione. Diverse configurazioni costituite da array paralleli di processori HMM sono state implementate su FPGA. Le soluzioni che offrivano un miglior compromesso tra prestazioni e quantità di risorse hardware utilizzate sono state selezionate per ulteriori analisi. Un sistema software per il pattern recognition basato su HMM stato scelto come sistema di riferimento per verificare la corretta funzionalità delle architetture implementate. Diversi test sono stati progettati per validare che il funzionamento del sistema corrispondesse alle specifiche iniziali. Le versioni implementate del sistema sono state confrontate con il software di riferimento sulla base dei risultati forniti dai test. Dal confronto è stato possibile appurare che le architetture sviluppate hanno un comportamento corrispondente a quello richiesto. Infine le implementazioni dell’array parallelo di processori HMM `e sono state applicate a due applicazioni reali: un riconoscitore vocale, ed un classificatore per interfacce basate su segnali elettroencefalografici. In entrambi i casi l’architettura si è dimostrata in grado di gestire l’applicazione senza alcun problema. L’uso del processamento hardware per il riconoscimento vocale apre di fatto la strada a nuovi sviluppi nel campo grazie al notevole incremento di prestazioni ottenibili in termini di tempo di esecuzione. L’applicazione al processamento dell’EEG, invece, introduce di fatto un approccio completamente nuovo alla classificazione di questo tipo di segnali, e mostra come in futuro potrebbe essere possibile lo sviluppo di interfacce basate sulla classificazione dei segnali generati dal pensiero spontaneo. I possibili sviluppi del lavoro iniziato con questa tesi sono molteplici. Una direzione possibile è quella dell’implementazione completa dell’architettura proposta come un sistema stand–alone riconfigurabile per l’accelerazione di sistemi per pattern recognition di qualsiasi natura purchè basati su HMM. Le potenzialità di tale sistema renderebbero possibile la realizzazione di classificatiori in tempo reale con un alto grado di complessità, e quindi allo sviluppo di interfacce realmente multimodali, con una vasta gamma di applicazioni, dai sistemi di per lo spazio a quelli di supporto per persone disabili.In this thesis a new approach is described for the development of human–computer interfaces. In particular the case of pattern recognition systems based on Hidden Markov Models have been taken into account. The research started from he development of techniques for the realization of natural language speech recognition systems. The Hidden Markov Model (HMM) was chosen as the main algorithmic tool to be used to build the system. After the early work the goal was extended to the development of an hardware architecture that provided a reconfigurable tool to be used in any pattern recognition task, and not only in speech recognition. The whole work is thus focused on the development of dedicated hardware architectures, but also some new results have been obtained on the classification of electroencephalographic signals through the use of HMMs. Firstly a system–level architecture has been developed to be used in HMM based pattern recognition systems. The architecture has been conceived in order to be able to work as a stand–alone system. Then a VHDL description has been made of a flexible and completely reconfigurable hardware HMM processor and the design was successfully simulated. A parallel array of these processors is actually the core processing block of the developed architecture. Then two suitable FPGA based, fast prototyping platforms have been identified to be the targets for the implementation tests. Different configurations of parallel HMM processor arrays have been set up and mapped on the target FPGAs. Some solutions have been selected to be the best in terms of balance between performance and resources utilization. Furthermore a software HMM based pattern recognition system has been chosen to be the reference system for the functionality of the implemented subsystems. A set of tests have been developed with the aim to test the correct functionality of the hardware. The implemented system was compared to the reference system on the basis of the tests’ results, and it was found that the behavior was the one expected and the required functionality was correctly achieved. Finally the implementation of the parallel HMM array was tested through its application to two real–world applications: a speech recognition task and a brain–computer interface task. In both cases the architecture showed to be functionally suitable and powerful enough to handle the task without problems. The application of the hardware processing to speech recognition opens new perspectives in the design of this kind of systems because of the dramatic increment in performance. The application to brain–computer interface is really interesting because of a new approach in the classification of EEG that shows how could be possible a future development of interfaces based on the classification of spontaneous thought. The possible evolution directions of the work started with this thesis are many. Effort could be spent of the implementation of the developed architecture as a stand–alone reconfigurable system suitable for any kind of HMM–based pattern recognition task. The potential performance of such a system could open the way to extremely complex real–time pattern recognition systems, and thus to the realization of truly multimodal interfaces, with a variety of applications, from space to aid systems for the impaired

    Any-way and Sparse Analyses for Multimodal Fusion and Imaging Genomics

    Get PDF
    This dissertation aims to develop new algorithms that leverage sparsity and mutual information across data modalities built upon the independent component analysis (ICA) framework to improve the performance of current ICA-based multimodal fusion approaches. These algorithms are further applied to both simulated data and real neuroimaging and genomic data to examine their performance. The identified neuroimaging and genomic patterns can help better delineate the pathology of mental disorders or brain development. To alleviate the signal-background separation difficulties in infomax-decomposed sources for genomic data, we propose a sparse infomax by enhancing a robust sparsity measure, the Hoyer index. Hoyer index is scale-invariant and well suited for ICA frameworks since the scale of decomposed sources is arbitrary. Simulation results demonstrate that sparse infomax increases the component detection accuracy for situations where the source signal-to-background (SBR) ratio is low, particularly for single nucleotide polymorphism (SNP) data. The proposed sparse infomax is further extended into two data modalities as a sparse parallel ICA for applications to imaging genomics in order to investigate the associations between brain imaging and genomics. Simulation results show that sparse parallel ICA outperforms parallel ICA with improved accuracy for structural magnetic resonance imaging (sMRI)-SNP association detection and component spatial map recovery, as well as with enhanced sparsity for sMRI and SNP components under noisy cases. Applying the proposed sparse parallel ICA to fuse the whole-brain sMRI and whole-genome SNP data of 24985 participants in the UK biobank, we identify three stable and replicable sMRI-SNP pairs. The identified sMRI components highlight frontal, parietal, and temporal regions and associate with multiple cognitive measures (with different association strengths in different age groups for the temporal component). Top SNPs in the identified SNP factor are enriched in inflammatory disease and inflammatory response pathways, which also regulate gene expression, isoform percentage, transcription expression, or methylation level in the frontal region, and the regulation effects are significantly enriched. Applying the proposed sparse parallel ICA to imaging genomics in attention-deficit/hyperactivity disorder (ADHD), we identify and replicate one SNP component related to gray matter volume (GMV) alterations in superior and middle frontal gyri underlying working memory deficit in adults and adolescents with ADHD. The association is more significant in ADHD families than controls and stronger in adults and older adolescents than younger ones. The identified SNP component highlights SNPs in long non-coding RNAs (lncRNAs) in chromosome 5 and in several protein-coding genes that are involved in ADHD, such as MEF2C, CADM2, and CADPS2. Top SNPs are enriched in human brain neuron cells and regulate gene expression, isoform percentage, transcription expression, or methylation level in the frontal region. Moreover, to increase the flexibility and robustness in mining multimodal data, we propose aNy-way ICA, which optimizes the entire correlation structure of linked components across any number of modalities via the Gaussian independent vector analysis and simultaneously optimizes independence via separate (parallel) ICAs. Simulation results demonstrate that aNy-way ICA recover sources and loadings, as well as the true covariance patterns with improved accuracy compared to existing multimodal fusion approaches, especially under noisy conditions. Applying the proposed aNy-way ICA to integrate structural MRI, fractal n-back, and emotion identification task functional MRIs collected in the Philadelphia Neurodevelopmental Cohort (PNC), we identify and replicate one linked GMV-threat-2-back component, and the threat and 2-back components are related to intelligence quotient (IQ) score in both discovery and replication samples. Lastly, we extend the proposed aNy-way ICA with a reference constraint to enable prior-guided multimodal fusion. Simulation results show that aNy-way ICA with reference recovers the designed linkages between reference and modalities, cross-modality correlations, as well as loading and component matrices with improved accuracy compared to multi-site canonical correlation analysis with reference (MCCAR)+joint ICA under noisy conditions. Applying aNy-way ICA with reference to supervise structural MRI, fractal n-back, and emotion identification task functional MRIs fusion in PNC with IQ as the reference, we identify and replicate one IQ-related GMV-threat-2-back component, and this component is significantly correlated across modalities in both discovery and replication samples.Ph.D

    Decoding non-invasive brain activity with novel deep-learning approaches

    Get PDF
    This thesis delves into the world of non-invasive electrophysiological brain signals like electroencephalography (EEG) and magnetoencephalography (MEG), focusing on modelling and decoding such data. The research aims to investigate what happens in the brain when we perceive visual stimuli or engage in covert speech (inner speech) and enhance the decoding performance of such stimuli. The findings have significant implications for the development of brain-computer interfaces (BCIs), leading to assistive communication technologies for paralysed individuals. The thesis is divided into two main sections, methodological and experimental work. A central concern in both sections is the large variability present in electrophysiological recordings, whether it be within-subject or between-subject variability, and to a certain extent between-dataset variability. In the methodological sections, we explore the potential of deep learning for brain decoding. The research acknowledges the urgent need for more sophisticated models and larger datasets to improve the decoding and modelling of EEG and MEG signals. We present advancements in decoding visual stimuli using linear models at the individual subject level. We then explore how deep learning techniques can be employed for group decoding, introducing new methods to deal with between-subject variability. Finally, we also explores novel forecasting models of MEG data based on convolutional and Transformer-based architectures. In particular, Transformer-based models demonstrate superior capabilities in generating signals that closely match real brain data, thereby enhancing the accuracy and reliability of modelling the brain’s electrophysiology. In the experimental section, we present a unique dataset containing high-trial inner speech EEG, MEG, and preliminary optically pumped magnetometer (OPM) data. We highlight the limitations of current BCI systems used for communication, which are either invasive or extremely slow. While inner speech decoding from non-invasive brain signals has great promise, it has been a challenging goal in the field with limited decoding approaches, indicating a significant gap that needs to be addressed. Our aim is to investigate different types of inner speech and push decoding performance by collecting a high number of trials and sessions from a few participants. However, the decoding results are found to be mostly negative, underscoring the difficulty of decoding inner speech. In conclusion, this thesis provides valuable insight into the challenges and potential solutions in the field of electrophysiology, particularly in the decoding of visual stimuli and inner speech. The findings could pave the way for future research and advancements in the field, ultimately improving communication capabilities for paralysed individuals

    Relating EEG to continuous speech using deep neural networks: a review

    Full text link
    Objective. When a person listens to continuous speech, a corresponding response is elicited in the brain and can be recorded using electroencephalography (EEG). Linear models are presently used to relate the EEG recording to the corresponding speech signal. The ability of linear models to find a mapping between these two signals is used as a measure of neural tracking of speech. Such models are limited as they assume linearity in the EEG-speech relationship, which omits the nonlinear dynamics of the brain. As an alternative, deep learning models have recently been used to relate EEG to continuous speech, especially in auditory attention decoding (AAD) and single-speech-source paradigms. Approach. This paper reviews and comments on deep-learning-based studies that relate EEG to continuous speech in AAD and single-speech-source paradigms. We point out recurrent methodological pitfalls and the need for a standard benchmark of model analysis. Main results. We gathered 29 studies. The main methodological issues we found are biased cross-validations, data leakage leading to over-fitted models, or disproportionate data size compared to the model's complexity. In addition, we address requirements for a standard benchmark model analysis, such as public datasets, common evaluation metrics, and good practices for the match-mismatch task. Significance. We are the first to present a review paper summarizing the main deep-learning-based studies that relate EEG to speech while addressing methodological pitfalls and important considerations for this newly expanding field. Our study is particularly relevant given the growing application of deep learning in EEG-speech decoding

    Neonatal Seizure Detection Using Deep Convolutional Neural Networks

    Get PDF
    Identifying a core set of features is one of the most important steps in the development of an automated seizure detector. In most of the published studies describing features and seizure classifiers, the features were hand-engineered, which may not be optimal. The main goal of the present paper is using deep convolutional neural networks (CNNs) and random forest to automatically optimize feature selection and classification. The input of the proposed classifier is raw multi-channel EEG and the output is the class label: seizure/nonseizure. By training this network, the required features are optimized, while fitting a nonlinear classifier on the features. After training the network with EEG recordings of 26 neonates, five end layers performing the classification were replaced with a random forest classifier in order to improve the performance. This resulted in a false alarm rate of 0.9 per hour and seizure detection rate of 77% using a test set of EEG recordings of 22 neonates that also included dubious seizures. The newly proposed CNN classifier outperformed three data-driven feature-based approaches and performed similar to a previously developed heuristic method
    • …
    corecore