6 research outputs found
Recommended from our members
Robust Learning Techniques for Deep Neural Networks
Deep Neural Networks (DNNs) yield state-of-the-art performance in an increasing array of applications. Despite the pervasive impact of DNNs, there remain significant concerns regarding their (lack of) stability and robustness. In this thesis, we explore several complementary approaches for guiding DNNs to learn robust and stable features, including domain expertise, domain-specific measures, and neuro-inspired modifications. We present novel augmentation techniques, cost functions, and data rejection methods that supplement conventional DNN training for reliable feature extraction.We first study the robustness in the presence of strong confounding factors for Radio-frequency (RF) fingerprinting in which the aim is to distinguish devices using subtle hardware imperfections which vary from device to device. However, the features such as carrier frequency offset and wireless channel misguide DNNs. We point out that, unless proactively discouraged from doing so, DNNs learn these strong confounding features rather than the nonlinear device-specific characteristics that we seek to learn. We investigate and evaluate strategies based on augmentation and estimation to promote generalization across realizations of these confounding factors using WiFi data. In our second study, we present robustness measures in the context of self-supervised contrastive learning. We investigate how to pretrain speaker recognition models by leveraging dialogues between customers and smart-speaker devices. However, the supervisory information in such dialogues is inherently noisy, as multiple speakers may speak to a device in the course of the same dialogue. To address this issue, we propose an effective rejection mechanism that selectively learns from dialogues based on their acoustic homogeneity. We also present a novel cost function particularly designed for a corrupted dataset in the contrastive learning setting.Lastly, we introduce a promising neuro-inspired architectural DNN design and a cost function to learn robust and interpretable features. We develop a software framework in which end-to-end costs can be supplemented with costs which depend on layer-wise activations, permitting more fine-grained control of features. We apply this framework to include Hebbian/anti-Hebbian (HaH) learning in a discriminative setting, demonstrating promising gains in robustness for the CIFAR10 image classification
Neuro-Inspired Deep Neural Networks with Sparse, Strong Activations
While end-to-end training of Deep Neural Networks (DNNs) yields state of the
art performance in an increasing array of applications, it does not provide
insight into, or control over, the features being extracted. We report here on
a promising neuro-inspired approach to DNNs with sparser and stronger
activations. We use standard stochastic gradient training, supplementing the
end-to-end discriminative cost function with layer-wise costs promoting Hebbian
("fire together," "wire together") updates for highly active neurons, and
anti-Hebbian updates for the remaining neurons. Instead of batch norm, we use
divisive normalization of activations (suppressing weak outputs using strong
outputs), along with implicit normalization of neuronal weights.
Experiments with standard image classification tasks on CIFAR-10 demonstrate
that, relative to baseline end-to-end trained architectures, our proposed
architecture (a) leads to sparser activations (with only a slight compromise on
accuracy), (b) exhibits more robustness to noise (without being trained on
noisy data), (c) exhibits more robustness to adversarial perturbations (without
adversarial training).Comment: 5 pages, 5 figure
Self-supervised Speaker Recognition Training Using Human-Machine Dialogues
Speaker recognition, recognizing speaker identities based on voice alone,
enables important downstream applications, such as personalization and
authentication. Learning speaker representations, in the context of supervised
learning, heavily depends on both clean and sufficient labeled data, which is
always difficult to acquire. Noisy unlabeled data, on the other hand, also
provides valuable information that can be exploited using self-supervised
training methods. In this work, we investigate how to pretrain speaker
recognition models by leveraging dialogues between customers and smart-speaker
devices. However, the supervisory information in such dialogues is inherently
noisy, as multiple speakers may speak to a device in the course of the same
dialogue. To address this issue, we propose an effective rejection mechanism
that selectively learns from dialogues based on their acoustic homogeneity.
Both reconstruction-based and contrastive-learning-based self-supervised
methods are compared. Experiments demonstrate that the proposed method provides
significant performance improvements, superior to earlier work. Dialogue
pretraining when combined with the rejection mechanism yields 27.10% equal
error rate (EER) reduction in speaker recognition, compared to a model without
self-supervised pretraining.Comment: 5 pages, 2 figure