1,177 research outputs found

    End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks

    Full text link
    In this work we present a novel end-to-end framework for tracking and classifying a robot's surroundings in complex, dynamic and only partially observable real-world environments. The approach deploys a recurrent neural network to filter an input stream of raw laser measurements in order to directly infer object locations, along with their identity in both visible and occluded areas. To achieve this we first train the network using unsupervised Deep Tracking, a recently proposed theoretical framework for end-to-end space occupancy prediction. We show that by learning to track on a large amount of unsupervised data, the network creates a rich internal representation of its environment which we in turn exploit through the principle of inductive transfer of knowledge to perform the task of it's semantic classification. As a result, we show that only a small amount of labelled data suffices to steer the network towards mastering this additional task. Furthermore we propose a novel recurrent neural network architecture specifically tailored to tracking and semantic classification in real-world robotics applications. We demonstrate the tracking and classification performance of the method on real-world data collected at a busy road junction. Our evaluation shows that the proposed end-to-end framework compares favourably to a state-of-the-art, model-free tracking solution and that it outperforms a conventional one-shot training scheme for semantic classification

    Contributions to statistical analysis methods for neural spiking activity

    Full text link
    With the technical advances in neuroscience experiments in the past few decades, we have seen a massive expansion in our ability to record neural activity. These advances enable neuroscientists to analyze more complex neural coding and communication properties, and at the same time, raise new challenges for analyzing neural spiking data, which keeps growing in scale, dimension, and complexity. This thesis proposes several new statistical methods that advance statistical analysis approaches for neural spiking data, including sequential Monte Carlo (SMC) methods for efficient estimation of neural dynamics from membrane potential threshold crossings, state-space models using multimodal observation processes, and goodness-of-fit analysis methods for neural marked point process models. In a first project, we derive a set of iterative formulas that enable us to simulate trajectories from stochastic, dynamic neural spiking models that are consistent with a set of spike time observations. We develop a SMC method to simultaneously estimate the parameters of the model and the unobserved dynamic variables from spike train data. We investigate the performance of this approach on a leaky integrate-and-fire model. In another project, we define a semi-latent state-space model to estimate information related to the phenomenon of hippocampal replay. Replay is a recently discovered phenomenon where patterns of hippocampal spiking activity that typically occur during exploration of an environment are reactivated when an animal is at rest. This reactivation is accompanied by high frequency oscillations in hippocampal local field potentials. However, methods to define replay mathematically remain undeveloped. In this project, we construct a novel state-space model that enables us to identify whether replay is occurring, and if so to estimate the movement trajectories consistent with the observed neural activity, and to categorize the content of each event. The state-space model integrates information from the spiking activity from the hippocampal population, the rhythms in the local field potential, and the rat's movement behavior. Finally, we develop a new, general time-rescaling theorem for marked point processes, and use this to develop a general goodness-of-fit framework for neural population spiking models. We investigate this approach through simulation and a real data application

    Developing implant technologies and evaluating brain-machine interfaces using information theory

    Full text link
    Brain-machine interfaces (BMIs) hold promise for restoring motor functions in severely paralyzed individuals. Invasive BMIs are capable of recording signals from individual neurons and typically provide the highest signal-to-noise ratio. Despite many efforts in the scientific community, BMI technology is still not reliable enough for widespread clinical application. The most prominent challenges include biocompatibility, stability, longevity, and lack of good models for informed signal processing and BMI comparison. To address the problem of low signal quality of chronic probes, in the first part of the thesis one such design, the Neurotrophic Electrode, was modified by increasing its channel capacity to form a Neurotrophic Array (NA). Specifically, single wires were replaced with stereotrodes and the total number of recording wires was increased. This new array design was tested in a rhesus macaque performing a delayed saccade task. The NA recorded little single unit spiking activity, and its local field potentials (LFPs) correlated with presented visual stimuli and saccade locations better than did extracted spikes. The second part of the thesis compares the NA to the Utah Array (UA), the only other micro-array approved for chronic implantation in a human brain. The UA recorded significantly more spiking units, which had larger amplitudes than NA spikes. This was likely due to differences in the array geometry and construction. LFPs on the NA electrodes were more correlated with each other than those on the UA. These correlations negatively impacted the NA's information capacity when considering more than one recording site. The final part of this dissertation applies information theory to develop objective measures of BMI performance. Currently, decoder information transfer rate (ITR) is the most popular BMI information performance metric. However, it is limited by the selected decoding algorithm and does not represent the full task information embedded in the recorded neural signal. A review of existing methods to estimate ITR is presented, and these methods are interpreted within a BMI context. A novel Gaussian mixture Monte Carlo method is developed to produce good ITR estimates with a low number of trials and high number of dimensions, as is typical for BMI applications

    Machine Learning for Physiological Time Series: Representing and Controlling Blood Glucose for Diabetes Management

    Full text link
    Type 1 diabetes is a chronic health condition affecting over one million patients in the US, where blood glucose (sugar) levels are not well regulated by the body. Researchers have sought to use physiological data (e.g., blood glucose measurements) collected from wearable devices to manage this disease, either by forecasting future blood glucose levels for predictive alarms, or by automating insulin delivery for blood glucose management. However, the application of machine learning (ML) to these data is hampered by latent context, limited supervision and complex temporal dependencies. To address these challenges, we develop and evaluate novel ML approaches in the context of i) representing physiological time series, particularly for forecasting blood glucose values and ii) decision making for when and how much insulin to deliver. When learning representations, we leverage the structure of the physiological sequence as an implicit information stream. In particular, we a) incorporate latent context when predicting adverse events by jointly modeling patterns in the data and the context those patterns occurred under, b) propose novel types of self-supervision to handle limited data and c) propose deep models that predict functions underlying trajectories to encode temporal dependencies. In the context of decision making, we use reinforcement learning (RL) for blood glucose management. Through the use of an FDA-approved simulator of the glucoregulatory system, we achieve strong performance using deep RL with and without human intervention. However, the success of RL typically depends on realistic simulators or experimental real-world deployment, neither of which are currently practical for problems in health. Thus, we propose techniques for leveraging imperfect simulators and observational data. Beyond diabetes, representing and managing physiological signals is an important problem. By adapting techniques to better leverage the structure inherent in the data we can help overcome these challenges.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163134/1/ifox_1.pd

    ๊ฐ•์ธํ•œ ์Œ์„ฑ์ธ์‹์„ ์œ„ํ•œ DNN ๊ธฐ๋ฐ˜ ์Œํ–ฅ ๋ชจ๋ธ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ๊น€๋‚จ์ˆ˜.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ•์ธํ•œ ์Œ์„ฑ์ธ์‹์„ ์œ„ํ•ด์„œ DNN์„ ํ™œ์šฉํ•œ ์Œํ–ฅ ๋ชจ๋ธ๋ง ๊ธฐ๋ฒ•๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€์˜ DNN ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” DNN์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์žก์Œ ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ๊ฐ•์ธํ•จ์„ ๋ณด์กฐ ํŠน์ง• ๋ฒกํ„ฐ๋“ค์„ ํ†ตํ•˜์—ฌ ์ตœ๋Œ€๋กœ ํ™œ์šฉํ•˜๋Š” ์Œํ–ฅ ๋ชจ๋ธ๋ง ๊ธฐ๋ฒ•์ด๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•์„ ํ†ตํ•˜์—ฌ DNN์€ ์™œ๊ณก๋œ ์Œ์„ฑ, ๊นจ๋—ํ•œ ์Œ์„ฑ, ์žก์Œ ์ถ”์ •์น˜, ๊ทธ๋ฆฌ๊ณ  ์Œ์†Œ ํƒ€๊ฒŸ๊ณผ์˜ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ๋ณด๋‹ค ์›ํ™œํ•˜๊ฒŒ ํ•™์Šตํ•˜๊ฒŒ ๋œ๋‹ค. ๋ณธ ๊ธฐ๋ฒ•์€ Aurora-5 DB ์—์„œ ๊ธฐ์กด์˜ ๋ณด์กฐ ์žก์Œ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋ชจ๋ธ ์ ์‘ ๊ธฐ๋ฒ•์ธ ์žก์Œ ์ธ์ง€ ํ•™์Šต (noise-aware training, NAT) ๊ธฐ๋ฒ•์„ ํฌ๊ฒŒ ๋›ฐ์–ด๋„˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” DNN์„ ํ™œ์šฉํ•œ ๋‹ค ์ฑ„๋„ ํŠน์ง• ํ–ฅ์ƒ ๊ธฐ๋ฒ•์ด๋‹ค. ๊ธฐ์กด์˜ ๋‹ค ์ฑ„๋„ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ๋Š” ์ „ํ†ต์ ์ธ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•์ธ ๋น”ํฌ๋ฐ ๊ธฐ๋ฒ•์„ ํ†ตํ•˜์—ฌ ํ–ฅ์ƒ๋œ ๋‹จ์ผ ์†Œ์Šค ์Œ์„ฑ ์‹ ํ˜ธ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๊ทธ๋ฅผ ํ†ตํ•˜์—ฌ ์Œ์„ฑ์ธ์‹์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ธฐ์กด์˜ ๋น”ํฌ๋ฐ ์ค‘์—์„œ ๊ฐ€์žฅ ๊ธฐ๋ณธ์  ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ delay-and-sum (DS) ๋น”ํฌ๋ฐ ๊ธฐ๋ฒ•๊ณผ DNN์„ ๊ฒฐํ•ฉํ•œ ๋‹ค ์ฑ„๋„ ํŠน์ง• ํ–ฅ์ƒ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” DNN์€ ์ค‘๊ฐ„ ๋‹จ๊ณ„ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๊ณต๋™ ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ†ตํ•˜์—ฌ ์™œ๊ณก๋œ ๋‹ค ์ฑ„๋„ ์ž…๋ ฅ ์Œ์„ฑ ์‹ ํ˜ธ๋“ค๊ณผ ๊นจ๋—ํ•œ ์Œ์„ฑ ์‹ ํ˜ธ์™€์˜ ๊ด€๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์€ multichannel wall street journal audio visual (MC-WSJAV) corpus์—์„œ์˜ ์‹คํ—˜์„ ํ†ตํ•˜์—ฌ, ๊ธฐ์กด์˜ ๋‹ค์ฑ„๋„ ํ–ฅ์ƒ ๊ธฐ๋ฒ•๋“ค๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ถˆํ™•์ •์„ฑ ์ธ์ง€ ํ•™์Šต (Uncertainty-aware training, UAT) ๊ธฐ๋ฒ•์ด๋‹ค. ์œ„์—์„œ ์†Œ๊ฐœ๋œ ๊ธฐ๋ฒ•๋“ค์„ ํฌํ•จํ•˜์—ฌ ๊ฐ•์ธํ•œ ์Œ์„ฑ์ธ์‹์„ ์œ„ํ•œ ๊ธฐ์กด์˜ DNN ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•๋“ค์€ ๊ฐ๊ฐ์˜ ๋„คํŠธ์›Œํฌ์˜ ํƒ€๊ฒŸ์„ ์ถ”์ •ํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ ๊ฒฐ์ •๋ก ์ ์ธ ์ถ”์ • ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋Š” ์ถ”์ •์น˜์˜ ๋ถˆํ™•์ •์„ฑ ๋ฌธ์ œ ํ˜น์€ ์‹ ๋ขฐ๋„ ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ œ์•ˆํ•˜๋Š” UAT ๊ธฐ๋ฒ•์€ ํ™•๋ฅ ๋ก ์ ์ธ ๋ณ€ํ™” ์ถ”์ •์„ ํ•™์Šตํ•˜๊ณ  ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ์ธ ๋ณ€ํ™” ์˜คํ† ์ธ์ฝ”๋” (variational autoencoder, VAE) ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค. UAT๋Š” ์™œ๊ณก๋œ ์Œ์„ฑ ํŠน์ง• ๋ฒกํ„ฐ์™€ ์Œ์†Œ ํƒ€๊ฒŸ๊ณผ์˜ ๊ด€๊ณ„๋ฅผ ๋งค๊ฐœํ•˜๋Š” ๊ฐ•์ธํ•œ ์€๋‹‰ ๋ณ€์ˆ˜๋ฅผ ๊นจ๋—ํ•œ ์Œ์„ฑ ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ •์น˜์˜ ๋ถ„ํฌ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ๋งํ•œ๋‹ค. UAT์˜ ์€๋‹‰ ๋ณ€์ˆ˜๋“ค์€ ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์Œํ–ฅ ๋ชจ๋ธ์— ์ตœ์ ํ™”๋œ uncertainty decoding (UD) ํ”„๋ ˆ์ž„์›Œํฌ๋กœ๋ถ€ํ„ฐ ์œ ๋„๋œ ์ตœ๋Œ€ ์šฐ๋„ ๊ธฐ์ค€์— ๋”ฐ๋ผ์„œ ํ•™์Šต๋œ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์€ Aurora-4 DB์™€ CHiME-4 DB์—์„œ ๊ธฐ์กด์˜ DNN ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•๋“ค์„ ํฌ๊ฒŒ ๋›ฐ์–ด๋„˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.In this thesis, we propose three acoustic modeling techniques for robust automatic speech recognition (ASR). Firstly, we propose a DNN-based acoustic modeling technique which makes the best use of the inherent noise-robustness of DNN is proposed. By applying this technique, the DNN can automatically learn the complicated relationship among the noisy, clean speech and noise estimate to phonetic target smoothly. The proposed method outperformed noise-aware training (NAT), i.e., the conventional auxiliary-feature-based model adaptation technique in Aurora-5 DB. The second method is multi-channel feature enhancement technique. In the general multi-channel speech recognition scenario, the enhanced single speech signal source is extracted from the multiple inputs using beamforming, i.e., the conventional signal-processing-based technique and the speech recognition process is performed by feeding that source into the acoustic model. We propose the multi-channel feature enhancement DNN algorithm by properly combining the delay-and-sum (DS) beamformer, which is one of the conventional beamforming techniques and DNN. Through the experiments using multichannel wall street journal audio visual (MC-WSJ-AV) corpus, it has been shown that the proposed method outperformed the conventional multi-channel feature enhancement techniques. Finally, uncertainty-aware training (UAT) technique is proposed. The most of the existing DNN-based techniques including the techniques introduced above, aim to optimize the point estimates of the targets (e.g., clean features, and acoustic model parameters). This tampers with the reliability of the estimates. In order to overcome this issue, UAT employs a modified structure of variational autoencoder (VAE), a neural network model which learns and performs stochastic variational inference (VIF). UAT models the robust latent variables which intervene the mapping between the noisy observed features and the phonetic target using the distributive information of the clean feature estimates. The proposed technique outperforms the conventional DNN-based techniques on Aurora-4 and CHiME-4 databases.Abstract i Contents iv List of Figures ix List of Tables xiii 1 Introduction 1 2 Background 9 2.1 Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Experimental Database . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Aurora-4 DB . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Aurora-5 DB . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.3 MC-WSJ-AV DB . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.4 CHiME-4 DB . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 Two-stage Noise-aware Training for Environment-robust Speech Recognition 25 iii 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Noise-aware Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Two-stage NAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.1 Lower DNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.2 Upper DNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.3 Joint Training . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 GMM-HMM System . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.2 Training and Structures of DNN-based Techniques . . . . . . 37 3.4.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 40 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4 DNN-based Feature Enhancement for Robust Multichannel Speech Recognition 45 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2 Observation Model in Multi-Channel Reverberant Noisy Environment 49 4.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.1 Lower DNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.2 Upper DNN and Joint Training . . . . . . . . . . . . . . . . . 54 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.1 Recognition System and Feature Extraction . . . . . . . . . . 56 4.4.2 Training and Structures of DNN-based Techniques . . . . . . 58 4.4.3 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 62 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 iv 5 Uncertainty-aware Training for DNN-HMM System using Varia- tional Inference 67 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Uncertainty Decoding for Noise Robustness . . . . . . . . . . . . . . 72 5.3 Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4 VIF-based uncertainty-aware Training . . . . . . . . . . . . . . . . . 83 5.4.1 Clean Uncertainty Network . . . . . . . . . . . . . . . . . . . 91 5.4.2 Environment Uncertainty Network . . . . . . . . . . . . . . . 93 5.4.3 Prediction Network and Joint Training . . . . . . . . . . . . . 95 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.5.1 Experimental Setup: Feature Extraction and ASR System . . 96 5.5.2 Network Structures . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5.3 Eects of CUN on the Noise Robustness . . . . . . . . . . . . 104 5.5.4 Uncertainty Representation in Dierent SNR Condition . . . 105 5.5.5 Result of Speech Recognition . . . . . . . . . . . . . . . . . . 112 5.5.6 Result of Speech Recognition with LSTM-HMM . . . . . . . 114 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6 Conclusions 127 Bibliography 131 ์š”์•ฝ 145Docto
    • โ€ฆ
    corecore