1,177 research outputs found
Recommended from our members
Developing robust movement decoders for local field potentials
textBrain Computer Interfaces (BCI) are devices that translate acquired neural signals to command and control signals. Applications of BCI include neural rehabilitation and neural prosthesis (thought controlled wheelchair, thought controlled speller etc.) to aid patients with disabilities and to augment human computer interaction. A successful practical BCI requires a faithful acquisition modality to record high quality neural signals; a signal processing system to construct appropriate features from these signals; and an algorithm to translate these features to appropriate outputs. Intracortical recordings like local field potentials provide reliable high SNR signals over long periods and suit BCI applications well. However, the non-stationarity of neural signals poses a challenge in robust decoding of subject behavior. Most BCI research focuses either on developing daily re-calibrated decoders that require exhaustive training sessions; or on providing cross-validation results. Such results ignore the variation of signal characteristics over different sessions and provide an optimistic estimate of BCI performance. Specifically, traditional BCI algorithms fail to perform at the same level on chronological data recordings. Neural signals are susceptible to variations in signal characteristics due to changes in subject behavior and learning, and variability in electrode characteristics due to tissue interactions. While training day-specific BCI overcomes signal variability, BCI re-training causes user frustration and exhaustion. This dissertation presents contributions to solve these challenges in BCI research. Specifically, we developed decoders trained on a single recording session and applied them on subsequently recorded sessions. This strategy evaluates BCI in a practical scenario with a potential to alleviate BCI user frustration without compromising performance. The initial part of the dissertation investigates extracting features that remain robust to changes in neural signal over several days of recordings. It presents a qualitative feature extraction technique based on ranking the instantaneous power of multichannel data. These qualitative features remain robust to outliers and changes in the baseline of neural recordings, while extracting discriminative information. These features form the foundation in developing robust decoders. Next, this dissertation presents a novel algorithm based on the hypothesis that multiple neural spatial patterns describe the variation in behavior. The presented algorithm outperforms the traditional methods in decoding over chronological recordings. Adapting such a decoder over multiple recording sessions (over 6 weeks) provided > 90% accuracy in decoding eight movement directions. In comparison, performance of traditional algorithms like Common Spatial Patterns deteriorates to 16% over the same time. Over time, adaptation reinforces some spatial patterns while diminishing others. Characterizing these spatial patterns reduces model complexity without user input, while retaining the same accuracy levels. Lastly, this dissertation provides an algorithm that overcomes the variation in recording quality. Chronic electrode implantation causes changes in signal-to-noise ratio (SNR) of neural signals. Thus, some signals and their corresponding features available during training become unavailable during testing and vice-versa. The proposed algorithm uses prior knowledge on spatial pattern evolution to estimate unknown neural features. This algorithm overcomes SNR variations and provides up to 93% decoding of eight movement directions over 6 weeks. Since model training requires only one session, this strategy reduces user frustration. In a practical closed-loop BCI, the user learns to produce stable spatial patterns, which improves performance of the proposed algorithms.Electrical and Computer Engineerin
End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks
In this work we present a novel end-to-end framework for tracking and
classifying a robot's surroundings in complex, dynamic and only partially
observable real-world environments. The approach deploys a recurrent neural
network to filter an input stream of raw laser measurements in order to
directly infer object locations, along with their identity in both visible and
occluded areas. To achieve this we first train the network using unsupervised
Deep Tracking, a recently proposed theoretical framework for end-to-end space
occupancy prediction. We show that by learning to track on a large amount of
unsupervised data, the network creates a rich internal representation of its
environment which we in turn exploit through the principle of inductive
transfer of knowledge to perform the task of it's semantic classification. As a
result, we show that only a small amount of labelled data suffices to steer the
network towards mastering this additional task. Furthermore we propose a novel
recurrent neural network architecture specifically tailored to tracking and
semantic classification in real-world robotics applications. We demonstrate the
tracking and classification performance of the method on real-world data
collected at a busy road junction. Our evaluation shows that the proposed
end-to-end framework compares favourably to a state-of-the-art, model-free
tracking solution and that it outperforms a conventional one-shot training
scheme for semantic classification
Contributions to statistical analysis methods for neural spiking activity
With the technical advances in neuroscience experiments in the past few decades, we have seen a massive expansion in our ability to record neural activity. These advances enable neuroscientists to analyze more complex neural coding and communication properties, and at the same time, raise new challenges for analyzing neural spiking data, which keeps growing in scale, dimension, and complexity.
This thesis proposes several new statistical methods that advance statistical analysis approaches for neural spiking data, including sequential Monte Carlo (SMC) methods for efficient estimation of neural dynamics from membrane potential threshold crossings, state-space models using multimodal observation processes, and goodness-of-fit analysis methods for neural marked point process models.
In a first project, we derive a set of iterative formulas that enable us to simulate trajectories from stochastic, dynamic neural spiking models that are consistent with a set of spike time observations. We develop a SMC method to simultaneously estimate the parameters of the model and the unobserved dynamic variables from spike train data. We investigate the performance of this approach on a leaky integrate-and-fire model.
In another project, we define a semi-latent state-space model to estimate information related to the phenomenon of hippocampal replay. Replay is a recently discovered phenomenon where patterns of hippocampal spiking activity that typically occur during exploration of an environment are reactivated when an animal is at rest. This reactivation is accompanied by high frequency oscillations in hippocampal local field potentials. However, methods to define replay mathematically remain undeveloped. In this project, we construct a novel state-space model that enables us to identify whether replay is occurring, and if so to estimate the movement trajectories consistent with the observed neural activity, and to categorize the content of each event. The state-space model integrates information from the spiking activity from the hippocampal population, the rhythms in the local field potential, and the rat's movement behavior.
Finally, we develop a new, general time-rescaling theorem for marked point processes, and use this to develop a general goodness-of-fit framework for neural population spiking models. We investigate this approach through simulation and a real data application
Developing implant technologies and evaluating brain-machine interfaces using information theory
Brain-machine interfaces (BMIs) hold promise for restoring motor functions in severely paralyzed individuals. Invasive BMIs are capable of recording signals from individual neurons and typically provide the highest signal-to-noise ratio. Despite many efforts in the scientific community, BMI technology is still not reliable enough for widespread clinical application. The most prominent challenges include biocompatibility, stability, longevity, and lack of good models for informed signal processing and BMI comparison.
To address the problem of low signal quality of chronic probes, in the first part of the thesis one such design, the Neurotrophic Electrode, was modified by increasing its channel capacity to form a Neurotrophic Array (NA). Specifically, single wires were replaced with stereotrodes and the total number of recording wires was increased. This new array design was tested in a rhesus macaque performing a delayed saccade task. The NA recorded little single unit spiking activity, and its local field potentials (LFPs) correlated with presented visual stimuli and saccade locations better than did extracted spikes.
The second part of the thesis compares the NA to the Utah Array (UA), the only other micro-array approved for chronic implantation in a human brain. The UA recorded significantly more spiking units, which had larger amplitudes than NA spikes. This was likely due to differences in the array geometry and construction. LFPs on the NA electrodes were more correlated with each other than those on the UA. These correlations negatively impacted the NA's information capacity when considering more than one recording site.
The final part of this dissertation applies information theory to develop objective measures of BMI performance. Currently, decoder information transfer rate (ITR) is the most popular BMI information performance metric. However, it is limited by the selected decoding algorithm and does not represent the full task information embedded in the recorded neural signal. A review of existing methods to estimate ITR is presented, and these methods are interpreted within a BMI context. A novel Gaussian mixture Monte Carlo method is developed to produce good ITR estimates with a low number of trials and high number of dimensions, as is typical for BMI applications
Machine Learning for Physiological Time Series: Representing and Controlling Blood Glucose for Diabetes Management
Type 1 diabetes is a chronic health condition affecting over one million patients in the US, where blood glucose (sugar) levels are not well regulated by the body. Researchers have sought to use physiological data (e.g., blood glucose measurements) collected from wearable devices to manage this disease, either by forecasting future blood glucose levels for predictive alarms, or by automating insulin delivery for blood glucose management. However, the application of machine learning (ML) to these data is hampered by latent context, limited supervision and complex temporal dependencies. To address these challenges, we develop and evaluate novel ML approaches in the context of i) representing physiological time series, particularly for forecasting blood glucose values and ii) decision making for when and how much insulin to deliver. When learning representations, we leverage the structure of the physiological sequence as an implicit information stream. In particular, we a) incorporate latent context when predicting adverse events by jointly modeling patterns in the data and the context those patterns occurred under, b) propose novel types of self-supervision to handle limited data and c) propose deep models that predict functions underlying trajectories to encode temporal dependencies. In the context of decision making, we use reinforcement learning (RL) for blood glucose management. Through the use of an FDA-approved simulator of the glucoregulatory system, we achieve strong performance using deep RL with and without human intervention. However, the success of RL typically depends on realistic simulators or experimental real-world deployment, neither of which are currently practical for problems in health. Thus, we propose techniques for leveraging imperfect simulators and observational data. Beyond diabetes, representing and managing physiological signals is an important problem. By adapting techniques to better leverage the structure inherent in the data we can help overcome these challenges.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163134/1/ifox_1.pd
๊ฐ์ธํ ์์ฑ์ธ์์ ์ํ DNN ๊ธฐ๋ฐ ์ํฅ ๋ชจ๋ธ๋ง
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2019. 2. ๊น๋จ์.๋ณธ ๋
ผ๋ฌธ์์๋ ๊ฐ์ธํ ์์ฑ์ธ์์ ์ํด์ DNN์ ํ์ฉํ ์ํฅ ๋ชจ๋ธ๋ง ๊ธฐ๋ฒ๋ค์ ์ ์ํ๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ํฌ๊ฒ ์ธ ๊ฐ์ง์ DNN ๊ธฐ๋ฐ ๊ธฐ๋ฒ์ ์ ์ํ๋ค. ์ฒซ ๋ฒ์งธ๋ DNN์ด ๊ฐ์ง๊ณ ์๋ ์ก์ ํ๊ฒฝ์ ๋ํ ๊ฐ์ธํจ์ ๋ณด์กฐ ํน์ง ๋ฒกํฐ๋ค์ ํตํ์ฌ ์ต๋๋ก ํ์ฉํ๋ ์ํฅ ๋ชจ๋ธ๋ง ๊ธฐ๋ฒ์ด๋ค. ์ด๋ฌํ ๊ธฐ๋ฒ์ ํตํ์ฌ DNN์ ์๊ณก๋ ์์ฑ, ๊นจ๋ํ ์์ฑ, ์ก์ ์ถ์ ์น, ๊ทธ๋ฆฌ๊ณ ์์ ํ๊ฒ๊ณผ์ ๋ณต์กํ ๊ด๊ณ๋ฅผ ๋ณด๋ค ์ํํ๊ฒ ํ์ตํ๊ฒ ๋๋ค. ๋ณธ ๊ธฐ๋ฒ์ Aurora-5 DB ์์ ๊ธฐ์กด์ ๋ณด์กฐ ์ก์ ํน์ง ๋ฒกํฐ๋ฅผ ํ์ฉํ ๋ชจ๋ธ ์ ์ ๊ธฐ๋ฒ์ธ ์ก์ ์ธ์ง ํ์ต (noise-aware training, NAT) ๊ธฐ๋ฒ์ ํฌ๊ฒ ๋ฐ์ด๋๋ ์ฑ๋ฅ์ ๋ณด์๋ค.
๋ ๋ฒ์งธ๋ DNN์ ํ์ฉํ ๋ค ์ฑ๋ ํน์ง ํฅ์ ๊ธฐ๋ฒ์ด๋ค. ๊ธฐ์กด์ ๋ค ์ฑ๋ ์๋๋ฆฌ์ค์์๋ ์ ํต์ ์ธ ์ ํธ ์ฒ๋ฆฌ ๊ธฐ๋ฒ์ธ ๋นํฌ๋ฐ ๊ธฐ๋ฒ์ ํตํ์ฌ ํฅ์๋ ๋จ์ผ ์์ค ์์ฑ ์ ํธ๋ฅผ ์ถ์ถํ๊ณ ๊ทธ๋ฅผ ํตํ์ฌ ์์ฑ์ธ์์ ์ํํ๋ค. ์ฐ๋ฆฌ๋ ๊ธฐ์กด์ ๋นํฌ๋ฐ ์ค์์ ๊ฐ์ฅ ๊ธฐ๋ณธ์ ๊ธฐ๋ฒ ์ค ํ๋์ธ delay-and-sum (DS) ๋นํฌ๋ฐ ๊ธฐ๋ฒ๊ณผ DNN์ ๊ฒฐํฉํ ๋ค ์ฑ๋ ํน์ง ํฅ์ ๊ธฐ๋ฒ์ ์ ์ํ๋ค. ์ ์ํ๋ DNN์ ์ค๊ฐ ๋จ๊ณ ํน์ง ๋ฒกํฐ๋ฅผ ํ์ฉํ ๊ณต๋ ํ์ต ๊ธฐ๋ฒ์ ํตํ์ฌ ์๊ณก๋ ๋ค ์ฑ๋ ์
๋ ฅ ์์ฑ ์ ํธ๋ค๊ณผ ๊นจ๋ํ ์์ฑ ์ ํธ์์ ๊ด๊ณ๋ฅผ ํจ๊ณผ์ ์ผ๋ก ํํํ๋ค. ์ ์๋ ๊ธฐ๋ฒ์ multichannel wall street journal audio visual (MC-WSJAV) corpus์์์ ์คํ์ ํตํ์ฌ, ๊ธฐ์กด์ ๋ค์ฑ๋ ํฅ์ ๊ธฐ๋ฒ๋ค๋ณด๋ค ๋ฐ์ด๋ ์ฑ๋ฅ์ ๋ณด์์ ํ์ธํ์๋ค.
๋ง์ง๋ง์ผ๋ก, ๋ถํ์ ์ฑ ์ธ์ง ํ์ต (Uncertainty-aware training, UAT) ๊ธฐ๋ฒ์ด๋ค. ์์์ ์๊ฐ๋ ๊ธฐ๋ฒ๋ค์ ํฌํจํ์ฌ ๊ฐ์ธํ ์์ฑ์ธ์์ ์ํ ๊ธฐ์กด์ DNN ๊ธฐ๋ฐ ๊ธฐ๋ฒ๋ค์ ๊ฐ๊ฐ์ ๋คํธ์ํฌ์ ํ๊ฒ์ ์ถ์ ํ๋๋ฐ ์์ด์ ๊ฒฐ์ ๋ก ์ ์ธ ์ถ์ ๋ฐฉ์์ ์ฌ์ฉํ๋ค. ์ด๋ ์ถ์ ์น์ ๋ถํ์ ์ฑ ๋ฌธ์ ํน์ ์ ๋ขฐ๋ ๋ฌธ์ ๋ฅผ ์ผ๊ธฐํ๋ค. ์ด๋ฌํ ๋ฌธ์ ์ ์ ๊ทน๋ณตํ๊ธฐ ์ํ์ฌ ์ ์ํ๋ UAT ๊ธฐ๋ฒ์ ํ๋ฅ ๋ก ์ ์ธ ๋ณํ ์ถ์ ์ ํ์ตํ๊ณ ์ํํ ์ ์๋ ๋ด๋ด ๋คํธ์ํฌ ๋ชจ๋ธ์ธ ๋ณํ ์คํ ์ธ์ฝ๋ (variational autoencoder, VAE) ๋ชจ๋ธ์ ์ฌ์ฉํ๋ค. UAT๋ ์๊ณก๋ ์์ฑ ํน์ง ๋ฒกํฐ์ ์์ ํ๊ฒ๊ณผ์ ๊ด๊ณ๋ฅผ ๋งค๊ฐํ๋ ๊ฐ์ธํ ์๋ ๋ณ์๋ฅผ ๊นจ๋ํ ์์ฑ ํน์ง ๋ฒกํฐ ์ถ์ ์น์ ๋ถํฌ ์ ๋ณด๋ฅผ ์ด์ฉํ์ฌ ๋ชจ๋ธ๋งํ๋ค. UAT์ ์๋ ๋ณ์๋ค์ ๋ฅ ๋ฌ๋ ๊ธฐ๋ฐ ์ํฅ ๋ชจ๋ธ์ ์ต์ ํ๋ uncertainty decoding (UD) ํ๋ ์์ํฌ๋ก๋ถํฐ ์ ๋๋ ์ต๋ ์ฐ๋ ๊ธฐ์ค์ ๋ฐ๋ผ์ ํ์ต๋๋ค. ์ ์๋ ๊ธฐ๋ฒ์ Aurora-4 DB์ CHiME-4 DB์์ ๊ธฐ์กด์ DNN ๊ธฐ๋ฐ ๊ธฐ๋ฒ๋ค์ ํฌ๊ฒ ๋ฐ์ด๋๋ ์ฑ๋ฅ์ ๋ณด์๋ค.In this thesis, we propose three acoustic modeling techniques for robust automatic speech recognition (ASR). Firstly, we propose a DNN-based acoustic modeling technique which makes the best use of the inherent noise-robustness of DNN is proposed. By applying this technique, the DNN can automatically learn the complicated relationship among the noisy, clean speech and noise estimate to phonetic target smoothly. The proposed method outperformed noise-aware training (NAT), i.e., the conventional auxiliary-feature-based model adaptation technique in Aurora-5 DB.
The second method is multi-channel feature enhancement technique. In the general multi-channel speech recognition scenario, the enhanced single speech signal source is extracted from the multiple inputs using beamforming, i.e., the conventional signal-processing-based technique and the speech recognition process is performed by feeding that source into the acoustic model. We propose the multi-channel feature enhancement DNN algorithm by properly combining the delay-and-sum (DS) beamformer, which is one of the conventional beamforming techniques and DNN. Through the experiments using multichannel wall street journal audio visual (MC-WSJ-AV) corpus, it has been shown that the proposed method outperformed the conventional multi-channel feature enhancement techniques.
Finally, uncertainty-aware training (UAT) technique is proposed. The most of the existing DNN-based techniques including the techniques introduced above, aim to optimize the point estimates of the targets (e.g., clean features, and acoustic model parameters). This tampers with the reliability of the estimates. In order to overcome this issue, UAT employs a modified structure of variational autoencoder (VAE), a neural network model which learns and performs stochastic variational inference (VIF). UAT models the robust latent variables which intervene the mapping between the noisy observed features and the phonetic target using the distributive information of the clean feature estimates. The proposed technique outperforms the conventional DNN-based techniques on Aurora-4 and CHiME-4 databases.Abstract i
Contents iv
List of Figures ix
List of Tables xiii
1 Introduction 1
2 Background 9
2.1 Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Experimental Database . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Aurora-4 DB . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Aurora-5 DB . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 MC-WSJ-AV DB . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.4 CHiME-4 DB . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Two-stage Noise-aware Training for Environment-robust Speech
Recognition 25
iii
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Noise-aware Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Two-stage NAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Lower DNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Upper DNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.3 Joint Training . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1 GMM-HMM System . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.2 Training and Structures of DNN-based Techniques . . . . . . 37
3.4.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 40
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 DNN-based Feature Enhancement for Robust Multichannel Speech
Recognition 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Observation Model in Multi-Channel Reverberant Noisy Environment 49
4.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 Lower DNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.2 Upper DNN and Joint Training . . . . . . . . . . . . . . . . . 54
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.1 Recognition System and Feature Extraction . . . . . . . . . . 56
4.4.2 Training and Structures of DNN-based Techniques . . . . . . 58
4.4.3 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 62
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
iv
5 Uncertainty-aware Training for DNN-HMM System using Varia-
tional Inference 67
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Uncertainty Decoding for Noise Robustness . . . . . . . . . . . . . . 72
5.3 Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 VIF-based uncertainty-aware Training . . . . . . . . . . . . . . . . . 83
5.4.1 Clean Uncertainty Network . . . . . . . . . . . . . . . . . . . 91
5.4.2 Environment Uncertainty Network . . . . . . . . . . . . . . . 93
5.4.3 Prediction Network and Joint Training . . . . . . . . . . . . . 95
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.5.1 Experimental Setup: Feature Extraction and ASR System . . 96
5.5.2 Network Structures . . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.3 Eects of CUN on the Noise Robustness . . . . . . . . . . . . 104
5.5.4 Uncertainty Representation in Dierent SNR Condition . . . 105
5.5.5 Result of Speech Recognition . . . . . . . . . . . . . . . . . . 112
5.5.6 Result of Speech Recognition with LSTM-HMM . . . . . . . 114
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6 Conclusions 127
Bibliography 131
์์ฝ 145Docto
- โฆ